The Cassandra annotated changelog: 0.6.3
Cassandra 0.6.3 brought bug fixes and a number of minor improvements, but two important changes to cluster performance stand out:
- Remove hourly scan of all hints … instead, expose deliverHintsToEndpoint to JMX so it can be done manually
- Add ability to lower compaction priority
Remove hourly scan of all hints
As I mentioned during my State of Cassanda talk, hinted handoff is one of those ideas that’s trickier to implement correctly than it looks. Hinted handoff is a feature described in Amazon’s Dynamo where writes intended for a failed node may be “hinted” to another, which will then “hand them off” to the real destination when it recovers.
Prior to 0.6.3, Cassandra would check to see if it had any hinted data to hand off either (a) when the failure detector notified it that another node was back online, and (b) during an hourly scan of all hints, just in case.
It’s good practice in systems design to include safety nets like this, but this one was a problem: if a node was down for an extended period of time while the cluster was under a write-heavy workload, a significant amount of hints would be generated, and scanning through gigabytes of hints data every hour would consume large amounts of CPU while simultaneously evicting more-important data from the OS buffer cache.
So we scrapped the hourly scan, and as a replacement safety net added a JMX method so an operator can manually instruct Cassandra to hand off hinted data to a designated node.
Ability to lower compaction priority
Compaction is when Cassandra merges multiple SSTable data files into a single new one, throwing away obsolete column versions in the process. On i/o bound workloads — i.e., most read-heavy workloads — this creates a high varience in request latency: low during periods of compaction quiescence, and elevated when compaction is competing for i/o with the read workload. Lowering the priority of the compaction thread mitigates this by smoothing compaction out across a longer period.
Since 0.6.3 is part of a stable release series, this behavior is off by default. To enable it, add these options to cassandra.in.sh:
(Java thread priorities range from 1 to 10, with 1 being the lowest.)
Careful readers will note that Java does not provide access to ionice, and while Cassandra is starting to use native code to provide enhanced performance in places, this wasn’t one of those. It turns out that compactions uses enough CPU to deserialize each column from each row in the affected sstables and reseriale out the merged rows that simply reducing CPU priority is effective.
Even more careful readers will note that Java does not permit non-root processes to change thread priorities on Linux, even though Linux is perfectly happy to allow such a process to reduce thread priority, which is all we care about here. The odd-looking “ThreadPriorityPolicy=42″ option takes advantage of a bug in the Sun JVM to bypass that restriction.