Effective tuning depends not only on the types of operations your cluster performs most frequently, but also on the shape of the data itself. For example, Cassandra’s memtables have overhead for index structures on top of the actual data they store. If the size of the values stored in the columns is small compared to the number of columns and rows themselves (sometimes called “skinny rows”), this overhead can be substantial. Thus, the optimal tuning for this type of data is quite different than the optimal tuning for a small numbers of columns with more data (“fat rows”).
A memtable is a column family specific, in memory data structure that can be easily described as a write-back cache. Memtables are flushed to disk, creating SSTables whenever one of the configurable thresholds has been exceeded.
Effectively tuning memtable thresholds depends on your data as much as your write load. Memtable thresholds are configured primarily by MemtableThroughputInMB and MemtableOperationsInMillions. You should increase MemtableThroughputInMB if:
Look instead at adjusting MemtableObjectCountInMillions if, as previously described, you have large numbers of skinny rows. Memtable flushes should be tuned using this value instead to avoid consuming too much memory with metadata.
Note that any upwards adjustment of memtable thresholds will take memory away from caching and other internal Cassandra structures, so tune carefully and in small increments.
As previously mentioned, Cassandra’s default configuration opens the JVM with 1GB of memory. Many users new to Cassandra are tempted to turn this value up immediately to consume the majority of the underlying system’s RAM. Doing so in most cases is actually detrimental. The reason for this is that Cassandra, being essentially a database, spends a lot of time interacting with the operating system’s I/O infrastructure (via the JVM of course). Modern operating systems maintain disk caches for frequently accessed data and are very good at keeping this data in memory. Regardless of how much RAM your hardware has, you should keep the JVM heap size constrained by the following formula and allow the operating system’s file cache to do the rest:
MemtableThroughputInMB * 3 * (number of Column Families) + 1G + (size of internal caches)
Cassandra’s GCInspector will log information about garbage collection whenever a garbage collection takes longer than 200ms. If garbage collections are occurring frequently and are taking a moderate length of time to complete (such as ConcurrentMarkSweep taking a few seconds), this is an indication that there is a lot of garbage collection pressure on the JVM; this needs to be addressed by adding nodes, lowering cache sizes, or adjusting GC options.
After 0.6.7, GCInspector also logs its usual summary whenever messages are dropped to help determine the cause.
The key cache holds the location of keys in memory on a per-SSTable basis. For column family level read optimizations, turning this value up can have an immediate impact (as soon as the cache warms) when there are large numbers of frequently accessed rows or the size of the columns in the rows makes it impractical to cache the row itself.
Key cache performance can by using nodetool cfstats and examining the reported ‘Key cache hit rate’. JMX/jconsole may also be used similarly.
Unlike the key cache, the row cache holds the entire contents of the row in memory. It is best used when you have a small subset of data to keep hot and you frequently need most or all of the columns returned. For these use cases, row cache can have substantial performance benefits.
Row cache performance can by using nodetool cfstats and examining the reported ‘Row cache hit rate’. JMX/jconsole may also be used similarly.
Either nodetool cfstats or JMX/jconsole can be used to get the necessary information.
To calculate the approximate row cache size, multiply the reported ‘Row cache size’, which is the number of rows in the cache, by the ‘Compacted row mean size’ for every column family and sum them.
To approximate the key cache size, multiply the reported ‘Key cache size’ for each column family by average size of keys for that column family, and sum the results over all column families.
Key cache usually provides the most benefit for the least cost. Keys are typically very small compared to row size, so even caching many of them uses little memory. The key cache potentially eliminates one seek per SSTable that needs to be examined during a read, substantially reducing the number of read seeks. Of course, if keys within a column family are accessed uniformly randomly and it is too expensive to keep a large majority of the keys cached, the key cache will not be effective. However, in most applications, the probability of key accesses fits a normal (or Gaussian) distribution, which is conducive to caching. Studying your key access patterns may help you determine if a key cache is appropriate and what size is optimal.
Row cache is typically much more expensive per cached item, as the entire row is cached. For row cache to be effective, a very small set of rows must be very hot; in other words, probability of row access must be a normal distribution with a very low standard deviation. Row cache also provides a greater benefit when the entire row is accessed at once. Keep in mind that if rows are frequently evicted from the row cache, the garbage collector will be under more pressure, a problem the OS buffer cache does not suffer from.
The OS buffer cache can perform a role similar to the row cache, but be more efficient. There is no need for garbage collection with the OS buffer cache, and it tends to be very effective at keeping hot blocks in the cache. Additionally, the buffer cache caches blocks on writes and reads, whereas both the row cache and the key cache only add items to the cache on reads. For these reasons, it is usually a good idea to keep the row cache small or disabled and leave space for the OS buffer cache.