CompanySeptember 29, 2011

What’s New in Cassandra 1.0: Improved Memory and Disk Space Management

Jonathan Ellis
Jonathan EllisTechnology
What’s New in Cassandra 1.0: Improved Memory and Disk Space Management

Cassandra 1.0 incorporates several improvements to how Cassandra's storage engine manages memory and disk space, bringing better performance and addressing some of the most common pain points for Cassandra administration.

Off-heap row cache

Cassandra provides a built-in row cache for super-fast access to frequently requested data, competitive with standalone caching products but without the cache coherence problems that come from using a separate system, i.e., data in the cache becoming temporarily or even permanently out of sync with the database.

Cassandra 1.0 adds the ability to store cached rows in native memory, outside the Java heap. This results in both a smaller per-row memory footprint and reduced JVM heap requirements, which helps keep the heap size in the sweet spot for JVM garbage collection performance.

This off-heap row cache debuted in 0.8 but it didn't become the default until 1.0. It requires the JNA library to be installed; otherwise, Cassandra will automatically fall back to the old on-heap cache provider. The Debian and RPM packages of Cassandra install JNA automatically, but if you are installing from a tarball or source, you should install JNA as well. (For licensing reasons, JNA can't be distributed as part of Cassandra itself.)

Storage engine self-tuning

Cassandra 1.0's storage engine also self-tunes memtable sizes for the optimum balance between faster writes, reduced compaction overhead, and memory use.

Recall that memtables are the structure where Cassandra groups updates in memory before writing to disk. Prior to 1.0, Cassandra needed to be explicitly told how much space to allocate for each ColumnFamily's memtables.

This was tolerable but clunky for small numbers of ColumnFamilies, but whenever additional ColumnFamilies were created, the settings on the existing ones needed to be adjusted to make room. Getting this wrong was the primary cause of OutOfMemoryErrors.

Cassandra 1.0 only uses one memtable setting: memtable_total_space_in_mb (found in cassandra.yaml), which defaults to 1/3 of your JVM heap. Cassandra manages this space across all your ColumnFamilies and flushes memtables to disk as needed. This has been tested to work across hundreds or even thousands of ColumnFamilies. (Do note that a minimum of 1MB per memtable is used by the per-memtable arena allocator also introduced in 1.0, which is worth keeping in mind if you are looking at going from thousands to tens of thousands of ColumnFamilies.)

memtable_total_space_in_mb was introduced in 0.8 and has proved successful enough that for 1.0 we've disabled the old per-ColumnFamily memtable_operations_in_millions and memtable_throughput_in_mb settings. For backwards compatibility, Cassandra still accepts those settings from applications, but they are ignored.

Cassandra 1.0 also introduces a global commitlog_total_space_in_mb, which replaces the old memtable_flush_after_mins per-ColumnFamily setting. The purpose here is to set a bound on how much data will need to be replayed on startup. Since there is a single commitlog per server, an infrequently-updated ColumnFamily could otherwise keep CommitLog segments around for an arbitrarily long time. commitlog_total_space_in_mb will cause any unflushed memtables in the oldest CommitLog segments to be written to disk when its threshold is exceeded, allowing those segments to be removed.

Faster disk space reclamation

Cassandra 1.0 also improves on the disk side of the storage engine, using explicit reference counting to reclaim obsolete data files post-compaction.

In older versions, Cassandra cleaned up these data files when the JVM's garbage collection ran, which resulted in unpredictable reclaiming of space. Cassandra 1.0 provides behavior that matches what operators would naturally expect, and avoids ugly hacks like forcing a GC cycle when disk space is low, which Cassandra would do automatically as a failsafe.

Previously

Share

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.