To manage and access data in Cassandra, it is important to understand how Casssandra writes and reads data, the hinted handoff feature, areas of conformance and non-conformance to the ACID (atomic, consistent, isolated, durable) database properties. In Cassandra, consistency refers to how up-to-date and synchronized a row of data is on all of its replicas.
Cassandra includes client utilities and application programming interfaces (APIs) for developing applications for data storage and retrieval.
Cassandra delivers high availability for writing through its data replication strategy. Cassandra duplicates data on multiple peer nodes to ensure reliability and fault tolerance. Relational databases, on the other hand, typically structure tables to keep data duplication at a minimum. The relational database server has to do additional work to ensure data integrity across the tables. In Cassandra, maintaining integrity between related tables is not an issue. Cassandra tables are not related. Usually, Cassandra performs better on writes than relational databases.
When a write occurs, Cassandra stores the data in a structure in memory, the memtable, and also appends writes to the commit log on disk, providing configurable durability.
The commit log receives every write made to a Cassandra node, and these durable writes survive permanently even after hardware failure.
The more a table is used, the larger its memtable needs to be. Cassandra can dynamically
allocate the right amount of memory for the memtable or you can manage the amount of memory
being utilized yourself. When memtable contents exceed a configurable threshold, the memtable data, which includes indexes, is put in
a queue to be flushed to disk. You can configure the length of the queue by changing
memtable_flush_queue_size in the cassandra.yaml. If the data to be flushed exceeds the queue
size, Cassandra blocks writes. The memtable data is flushed to SSTables on disk using sequential I/O. Data in the commit log is purged after its
corresponding data in the memtable is flushed to the SSTable.
Memtables and SSTables are maintained per table. SSTables are immutable, not written to again after the memtable is flushed. Consequently, a row is typically stored across multiple SSTable files.
For each SSTable, Cassandra creates these in-memory structures:
A list of primary keys and the start position of rows in the data file.
A subset of the partition index. By default 1 primary key out of every 128 is sampled.
In the memtable, data is organized in sorted order.
For efficiency, Cassandra does not repeat the names of the columns in memory or in the SSTable. For example, the following writes occur:
write (k1, c1:v1) write (k2, c1:v1 C2:v2) write (k1, c1:v4 c3:v3 c2:v2)
In the memtable, Cassandra stores this data after receiving the writes:
k1 c1:v4 c2:v2 c3:v3 k2 c1:v1 c2:v2
In the commit log on disk, Cassandra stores this data after receiving the writes:
k1, c1:v1 k2, c1:v1 C2:v2 k1, c1:v4 c3:v3 c2:v2
In the SSTable on disk, Cassandra stores this data after flushing the memtable:
k1 c1:v4 c2:v2 c3:v3 k2 c1:v1 c2:v2
To update an index Cassandra appends data to the commit log, updates the memtable, and updates the index. Writing to a table having an index involves more work than writing to a table without an index, but the update process has been improved in Cassandra 1.2. The need for a synchronization lock to prevent concurrency issues for heavy insert loads has been removed.
When a column is updated, the index is updated. If the old column value was still in the memtable, which typically occurs when updating a small set of rows repeatedly, Cassandra removes the index entry; otherwise, the old entry remains to be purged by compaction. If a read sees a stale index entry before compaction purges it, the reader thread invalidates it.
As with relational databases, keeping indexes up to date is not free, so unnecessary indexes should be avoided.