DataStax Developer Blog

The annotated changelog: Cassandra 0.6.1

By Jonathan Ellis -  July 26, 2010 | 1 Comment

In the run up to Apache Cassandra 0.7, Riptano will be covering the new features and improvements made in a series of blog posts.

But before we start digging in to what’s coming in 0.7, I’d like to briefly cover what’s been going on quietly in the stable updates to the Cassandra 0.6 series. Stable updates are primarily about bug fixes, but occasionally they also see performance enhancements or even minor features at the operational level.

0.6.1 saw one example of both a performance enhancement and a new feature, and a third change worth mentioning for a different reason:

Let’s explore these in more detail.

Cache read-only BufferedRandomAccessFile length

One of the gaping holes in the Java standard library is the lack of buffering in RandomAccessFile. This has caused a lot of grief over the years and several third-party implementations of such buffering have resulted, including Cassandra’s, which inherits from the JDK RandomAccessFile and overrides the minimum necessary.

When Johan Oskarsson profiled the Hadoop InputFormat added to 0.6, he found that a surprising amount of time was spent calling RandomAccessFile.length. Some investigation revealed that not only does RandomAccessFile not cache the file length for read-only files, it makes three system calls for each invocation instead of the one (to fstat) that you would expect. (Which has the side effect that it’s not threadsafe either, although that doesn’t affect Cassandra.) So, we added the overridden length method that you see now.

Expose drain via nodetool

Cassandra upgrades guarantee that changes to the on-disk sstable data format, if any, will be handled transparently by Cassandra. That is, the new version will be able to read the old version, and as compactions happen the new version will be written out.

But, we do not similarly handle changes to the CommitLog on-disk format. If any such changes are made (in major versions only, such as 0.7) then the CommitLog needs to be empty when Cassandra is restarted after the upgrade. To make this easier, we added the “drain” command to the JMX admin interface and the nodetool commandline tool: drain tells the process to stop accepting any further writes, and to flush any data in the CommitLog so it can be safely removed.

Nodes with IPv6 (and no IPv4) addresses could not join cluster

This one is notable for a different reason. The Apache Cassandra project’s goal is for stable releases to be 100% drop-in compatible with other releases in the same major series (e.g., 0.6), but in making this fix for ipv6 clusters we inadvertently made 0.6.1 internal network traffic incompatible with 0.6.0, with the result that you had to perform a full-cluster restart when upgrading, rather than a “rolling” restart of one node at a time. Unfortunately, having released 0.6.1, we couldn’t fix this in 0.6.2 without causing the same problem in reverse for those already upgraded to 0.6.1. Lesson learned, I hope.

… and on that cautionary note, we’ve covered the highlights from 0.6.1. Next time: 0.6.2



Comments

  1. Richard grossman says:

    As expected cassandra going to better following from version 0.4 so much feature but how to help developper to influence management to adopt it. Why so much fears when switching to something new??
    If you can write an article on this points it’ll really help us

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>