Thrift HSHA users should run, not walk, to upgrade
Cassandra’s legacy Thrift API (distinct from the native CQL protocol) allows pluggable socket-handling server implementations. The default sync server creates a thread per client and is more performant for typical numbers of (usually pooled) client connections; the hsha (half-synchronous, half-asynchronous) implementation is an option that allows higher numbers of concurrent client connections, e.g. for PHP where pooling is difficult.
The hsha server was rewritten on top of Disruptor for Cassandra 2.0.0 to unlock substantial performance benefits over the old threadpool-based hsha. Unfortunately, the rewrite introduced a bug that can cause incorrect data to be sent from the coordinator to replicas. I apologize that it took so long for us to realize what was causing the compaction errors reported as far back as November.
Anyone running the hsha server in an earlier 2.0.x release should upgrade immediately. This can cause data loss.
Other reasons to upgrade
Since I last covered developments in the 2.0 release series, Cassandra has added a number of reasons to upgrade beyond the bug fixes:
- Static columns and lightweight transaction batching
- Gossip performance improvements for large clusters
- Atomic batch performance improvements (also here)
- Eliminating the possiblity of a thundering herd for routing calculations in virtual node clusters