|Operations / Data caching|
Cassandra includes integrated caching and distributes cache data around the cluster for you. When a node goes down, the client can read from another cached replica of the data. The integrated architecture also facilitates troubleshooting because there is no separate caching tier, and cached data matches what’s in the database exactly. The integrated cache solves the cold start problem by virtue of saving your cache to disk periodically and being able to read contents back in when it restarts—you never have to start with a cold cache.
The partition key cache is a cache of the partition index for a Cassandra table. Using the key cache instead of relying on the OS page cache saves CPU time and memory. However, enabling just the key cache results in disk (or OS page cache) activity to actually read the requested data rows.
The row cache is similar to a traditional cache like memcached: when a row is accessed, the entire row is pulled into memory (merging from multiple SSTables if necessary) and cached so that further reads against that row can be satisfied without hitting disk at all.
Typically, you enable either the partition key or row cache for a table. The main exception is for archive tables that are infrequently read. You should disable caching entirely for archive tables.
CREATE TABLE users ( userid text PRIMARY KEY, first_name text, last_name text, ) WITH caching = 'all';
When both row cache and partition key cache are configured, the row cache returns results
whenever possible. In the event of a row cache miss, the partition key cache might still
provide a hit that makes the disk seek much more efficient. This diagram depicts two read
operations on a table with both caches already populated.
One read operation hits the row cache, returning the requested row without a disk seek. The other read operation requests a row that is not present in the row cache but is present in the partition key cache. After accessing the row in the SSTable, the system returns the data and populates the row cache with this read operation.
Cassandra memtables have overhead for index structures on top of the actual data they store. If the size of the values stored in the heavily-read columns is small compared to the number of columns and rows themselves, this overhead can be substantial. Rows having this type of data do not lend themselves to efficient row caching.