DataStax Developer Blog

Six mid-series changes to know about in Cassandra 1.2.x

By Jonathan Ellis -  May 27, 2013 | 0 Comments

In principle, minor Cassandra releases contain only bug fixes. In practice, some minor improvements usually slip in, especially shortly after a major release. The 1.2 series has been unusually busy in that respect. It’s worth calling attention to six in particular:

Improved compaction throttling

Compaction is where Cassandra combines multiple data files to improve the performance of partition scans and to reclaim space from deleted data.

On traditional hard disks, compaction can compete with reads for disk i/o, so Cassandra has supported throttling compaction for a long time. Ideally, you want compaction running at a fairly low rate continuously, rather than causing abrupt load spikes.

For 1.2.5, we’ve improved the compaction_throughput_mb_per_sec setting to work better with large partitions. In older releases, Cassandra only checked the compaction throughput between partitions, so large partitions could still cause spikes of i/o demand.

Further reduced memory consumption

1.2.0 moved most data structures off-heap. Cassandra 1.2.5 reduces the memory needed by the last on-heap structure, the partition summary.

This allows Cassandra to handle many TB of data per process. As one user put it, “Now I run out of disk before I run out of Cassandra.”

Removed cell-name bloom filters

Most Cassandra users are familiar with the bloom filters Cassandra maintains to track what partition keys are contained in each data file. The cell-name bloom filter that is stored as part of the header for each partition is less well known; frankly, it wasn’t very useful.

We took it out for 2.0, but the change was a bigger one than we’d usually make in a stable release, so we left it alone in 1.2. Then we realized that cell-name bloom filters actually cause a rare bug with new-in-1.2.0 range tombstones. So, we ended up removing the filter in 1.2.5 after all.

The most noticeable effect will be faster queries against large partitions, since this is no longer part of the partition header.

Thread-local allocation

Cassandra has somewhat belatedly enabled the UseTLAB JVM flag, for about a 15% performance boost to reads. This is the default now in Cassandra 1.1.11+ and 1.2.4+.

PasswordAuthenticator and CassandraAuthorizer

As part of our work on security for DSE 3.0, DataStax contributed to Cassandra authentication and authorization implementations that store their metadata in Cassandra system tables. Kerberos integration, auditing, and other features like on-disk encryption are available in DataStax Enterprise.

LZ4 compression

Finally, Cassandra began shipping with support for Adrien Grand’s LZ4 port to Java in 1.2.2. LZ4 is about 50% faster at compression than Snappy (which remains Cassandra’s default until 2.0).

More

See the CHANGES file for an exhaustive list of changes since 1.2.0.

The 2013 Cassandra Summit is in two weeks! This will be the most information-dense two days about Cassandra ever, with 65 sessions from Accenture, Barracuda Networks, Blue Mountain Capital, Comcast, Constant Contact, eBay, Fusion-io, Intuit, Netflix, Sony, Splunk, Spotify, Walmart, and more. Check out the ten talks I’m most looking forward to, and register today with the code SFSummit25 for a 25% discount.



Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>