Apache Cassandra 1.2 Documentation

About reads

Cassandra performs random reads from SSD in parallel with extremely low latency, unlike most databases. Rotational disks are not recommended.

Cassandra reads, as well as writes, data by primary key, eliminating complex queries required by a relational database. First, Cassandra checks the Bloom filter. Each SSTable has a Bloom filter associated with it that checks if any data for the requested row exists in the SSTable before doing any disk I/O.

Next, Cassandra checks the global key cache. If the requested data is not in the key cache, Cassandra performs a binary search of the index summary to find a row. By default, 1 row key out of every 128 is sampled from the primary index to create the index summary. You configure sample frequency by changing the index_interval property in the cassandra.yaml file. You can probably increase the index_interval to 512 without seeing degradation.

Finally, Cassandra performs a single seek and a sequential read of columns (a range read) in the SSTable if the columns are contiguous, and returns the result set.

../../_images/caching-reads_12.png

Disk reads take place on a block level. One disk read of the index block corresponds to the closest sampled entry. Cassandra reads a row, plus some selection of columns or a range of columns. This process, in conjunction with fast lookup of data through primary and secondary indexes makes Cassandra is very performant on reads when compared to other storage systems, even for read-heavy workloads. Faster startup/bootup times for each node in a cluster are realized through the efficient sampling and loading of SSTable indexes into memory caches. The SSTable index load time is improved dramatically by eliminating the need to go through the whole primary index.

Reading a clustered row

Using a CQL 3 schema, Cassandra’s storage engine uses compound columns to store clustered rows. All the logical rows with the same partition key get stored as a single, physical row. Within a partition, all rows are not equally expensive to query. The very beginning of the partition -- the first rows, clustered by your key definition -- is slightly less expensive to query because there is no need to consult the partition-level index. For more information about clustered rows, see Compound keys and clustering.

About the read path

When a read request for a row comes in to a node, the row must be combined from all SSTables on that node that contain columns from the row in question, as well as from any unflushed memtables, to produce the requested data. This diagram depicts the read path of a read request, continuing the example in The write path of an update:


../../_images/read_ks_12.png

For example, you have a row of user data and need to update the user email address. Cassandra doesn't rewrite the entire row into a new data file, but just puts new email address in the new data file. The user name and password are still in the old data file.

The red lines in the SSTables in this diagram are fragments of a row that Cassandra needs to combine to give the user the requested results. Cassandra caches the merged value, not the raw row fragments. That saves some CPU and disk I/O.


../../_images/read_path_12.png

The row cache is a write-through cache, so if you have a cached row and you update that row, it will be updated in the cache and you still won't have to merge that again.

For a detailed explanation of how client read and write requests are handled in Cassandra, also see About client requests.

How write patterns affect reads

The type of compaction strategy Cassandra performs on your data is configurable and can significantly affect read preformance. Using the SizeTieredCompactionStrategy tends to cause data fragmentation when rows are frequently updated. The LeveledCompactionStrategy (LCS) was designed to prevent fragmentation under this condition. For more information about LCS, see the article Leveled Compaction in Apache Cassandra.

How the row cache affects reads

Typical of any database, reads are fastest when the most in-demand data (or hot working set) fits into memory. Although all modern storage systems rely on some form of caching to allow for fast access to hot data, not all of them degrade gracefully when the cache capacity is exceeded and disk I/O is required. Cassandra's read performance benefits from built-in caching. For rows that are accessed frequently, Cassandra has a built-in key cache and an optional row cache.

How compaction and compression affect reads

To prevent read speed from deteriorating, compaction runs in the background without random I/O. Compression maximizes the storage capacity of nodes and reduces disk I/O, particularly for read-dominated workloads.

When I/O activity starts to increase in Cassandra due to increased read load, typically the remedy is to add more nodes to the cluster. Cassandra avoids decompressing data in the middle of reading a data file, making its compression application-transparent.