Jonathan Ellis

<p>The headlining features in 2.0 are&nbsp;<a href="https://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0">lightweight transactions</a>,&nbsp;<a href="https://www.datastax.com/dev/blog/cql-in-cassandra-2-0">CQL enhancements</a>, and&nbsp;<a href="https://www.datastax.com/dev/blog/whats-new-in-cassandra-2-0-prototype-triggers-support">triggers</a>. But 2.0 also features a lot of internal optimizations and improvements!</p>

<h3>Performance optimization</h3>

<ul>
	<li>Tracking statistics on clustered columns allows&nbsp;<a href="https://issues.apache.org/jira/browse/CASSANDRA-5514">eliminating unnecessary sstables from the read path</a>.</li>
	<li><a href="https://issues.apache.org/jira/browse/CASSANDRA-4180">Single-pass compaction</a>&nbsp;roughly doubles compaction speed for large partitions as well as reducing the impact on the JVM heap and GC.</li>
	<li>Leveled compaction&nbsp;<a href="https://issues.apache.org/jira/browse/CASSANDRA-5371">now performs size-tiered compaction in L0</a>&nbsp;when it gets behind. This keeps read performance from deteriorating until leveling can catch back up. We've also&nbsp;<a href="https://issues.apache.org/jira/browse/CASSANDRA-5727">dramatically increased LCS sstable size</a>.</li>
	<li>For applications still using Thrift, the new&nbsp;<a href="https://issues.apache.org/jira/browse/CASSANDRA-5582">half-synchronous, half-asynchronous server based on LMAX Disruptor</a>&nbsp;cuts Thrift overhead dramatically.</li>
	<li>Faster partition index lookups and cache reads by&nbsp;<a href="https://issues.apache.org/jira/browse/CASSANDRA-5884">improving performance of off-heap memory</a>.</li>
	<li>Faster reads of compressed data by&nbsp;<a href="https://issues.apache.org/jira/browse/CASSANDRA-5862">switching from CRC32 to Adler checksums</a>.</li>
	<li><a href="https://issues.apache.org/jira/browse/CASSANDRA-3997">JEMalloc support for off-heap allocation</a>.</li>
	<li><a href="https://issues.apache.org/jira/browse/CASSANDRA-4885">Removing partition-level bloom filters</a>&nbsp;improves read performance by eliminating the bloom filter deserialization from each operation and reducing GC churn.</li>
</ul>

<h3>Spring cleaning</h3>

<ul>
	<li>Removed compatibility with pre-1.2.5 sstables and pre-1.2.9 schema. Upgrade through the latest version of 1.2.9 first.</li>
	<li>SuperColumns are gone internally,&nbsp;<a href="https://issues.apache.org/jira/browse/CASSANDRA-3237">replaced by composite cells</a>. The SuperColumn API is retained and translated transparently to maintain backwards compatibility. (Richard Low has a good writeup of&nbsp;<a href="http://www.wentnet.com/blog/?p=38">why supercolumns are obsolete</a>.)</li>
	<li>The potentially dangerous countPendingHints JMX call has been&nbsp;<a href="https://issues.apache.org/jira/browse/CASSANDRA-5746">replaced by a Hints Created metric</a>, which is performant enough to be monitored regularly besides eliminating the posibility of OOM-ing your node.</li>
	<li>The&nbsp;<a href="https://issues.apache.org/jira/browse/CASSANDRA-5348">on-heap partition cache has been removed</a>, leaving only the off-heap option.</li>
	<li>Vnodes are on by default, and the old token range bisection code for non-vnode clusters is&nbsp;<a href="https://issues.apache.org/jira/browse/CASSANDRA-5518">gone</a>. When not using vnodes, specify a token manually or one will be chosen randomly.</li>
	<li><a href="https://issues.apache.org/jira/browse/CASSANDRA-3534">Removed emergency memory pressure valve logic</a>. The intent here was to give operators enough breathing room to fix misconfigurations causing heap pressure, but it was never as reliable as we would have liked. And now that the important storage engine metadata has been moved off-heap, memory shortages will be obvious much earlier.</li>
</ul>

<h3>Operational concerns</h3>

<ul>
	<li>Java7 is now required!</li>
	<li>Leveled compaction level information has been&nbsp;<a href="https://issues.apache.org/jira/browse/CASSANDRA-4872">moved into sstable metadata</a>&nbsp;-- each sstable knows what level it's at, so there is no need for a separate manifest. This makes leveled compaction more robust and snapshots simpler.</li>
	<li>Kernel page cache skipping has been&nbsp;<a href="https://issues.apache.org/jira/browse/CASSANDRA-4937">removed in favor of optional row preheating</a>.</li>
	<li><a href="https://www.datastax.com/dev/blog/streaming-in-cassandra-2-0">Streaming has been rewritten</a>&nbsp;to be more transparent and robust.</li>
	<li><a href="https://issues.apache.org/jira/browse/CASSANDRA-5772">Streaming support for old-version sstables</a>&nbsp;means you no longer have to manually run&nbsp;<tt>upgradesstables</tt>&nbsp;across the cluster before you can perform repairs. It also means you can bulk load old snapshots directly.</li>
</ul>


What’s under the hood in Cassandra 2.0

Jonathan EllisTechnology

Share

Share

Performance optimization

Spring cleaning

Operational concerns

More Company

DataStax Acquires Langflow to Accelerate Generative AI Development

The Top 5 DataStax Stories from 2023

2023 Recap: Data = AI

DataStax Astra DB Nabs Three Prestigious 2023 TrustRadius “Best of” Awards, Dominates the Vector Databases Category

One-stop Data API for Production GenAI