Cassandra uses a storage structure similar to a Log-Structured Merge Tree, unlike a typical relational database that uses a B-Tree. The storage engine writes sequentially to disk in append mode and stores data contiguously. Operations are parallel across nodes and within an individual machine. Because Cassandra does not use a B-tree, concurrency control is unnecessary. Nothing needs to be updated when writing.
Cassandra accommodates modern solid-state disks (SSDs) extremely well. Inexpensive, consumer SSDs are fine for use with Cassandra because Cassandra minimizes wear and tear on an SSD. The disk I/O performed by Cassandra is minimal.
When database operations are serial, throughput and latency are interchangeable. Cassandra operations are performed in parallel, so throughput and latency are independent. Unlike most databases, Cassandra achieves excellent throughput and latency.
Writes are very efficient in Cassandra and very inefficient in storage engines that scatter random writes around while making in-place updates. When you're doing many random writes of small amounts of data, Cassandra reads in the SSD sector. No random seeking occurs as it does in relational databases. Cassandra's log-structured design obviates the need for disk seeks. As database updates are received, Cassandra does not overwrite rows in place. In-place updates would require doing random I/O. Cassandra updates the bytes and rewrites the entire sector back out instead of modifying the data on disk. Eliminating on-disk data modification and erase-block cycles prolongs the life of the SSD and saves time: one or two milliseconds.
Cassandra does not lock the fast write request path that would negatively affect throughput. Because there is no modification of data on disk, locking for concurrency control of data on disk is unnecessary. The operational design integrates nicely with the operating system page cache. Because Cassandra does not modify the data, dirty pages that would have to be flushed are not even generated.
Using SSDs instead of rotational disks is necessary for achieving low latency. Cassandra runs the same code on every node and has no master node and no single point of failure, which also helps achieve high throughput.
Cassandra 1.1 and later releases provide fine-grained control of table storage on disk, writing tables to disk using separate table directories within each keyspace directory. Data files are stored using this directory and file naming format:
The new file name format includes the keyspace name to distinguish which keyspace and table the file contains when streaming or bulk loading data. Cassandra creates a subdirectory for each table, which allows you to symlink a table to a chosen physical drive or data volume. This provides the capability to move very active tables to faster media, such as SSD’s for better performance, and also divvy up tables across all attached storage devices for better I/O balance at the storage layer.