Jonathan Ellis

Besides improvements to&nbsp;compaction&nbsp;and&nbsp;repair, 2.1 brings dramatic improvements to the core read and write paths. The two most important changes were:

<ol>
	<li>Adding&nbsp;response grouping&nbsp;to the CQL dispatcher, on a similar principle as&nbsp;Nagle's algorithm.</li>
	<li>Introducing the&nbsp;SharedExecutorPool&nbsp;for worker threads on replicas.</li>
</ol>

On reads, these combine for a 75% performance boost over 2.0 CQL, and 160% over Thrift:
<img alt="75% Performance Boost " data-align="center" data-entity-type="file" data-entity-uuid="8599c27e-42ab-4af0-9bc8-9b29d014ae37" src="https://www.datastax.com/sites/default/files/inline-images/Screen-Shot-2014-07-16-at-10.46.57-AM-700x529.png" />
On writes, we see a similar improvement -- 95% better than 2.0 CQL, and 150% better than Thrift:
<img alt="CQL vs. Thrift" data-align="center" data-entity-type="file" data-entity-uuid="db558564-a594-474b-8178-7a882ea0fbaf" src="https://www.datastax.com/sites/default/files/inline-images/Screen-Shot-2014-07-16-at-10.48.30-AM-700x516.png" />
But wait! Why is write performance so inconsistent in 2.1? Writes are mostly cruising along at over 190k ops/s, but frequently dips as low as 120, so the average only works out to about 180.

It turns out that after writing a custom&nbsp;<a href="https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/utils/btree/BTree.java">in-memory BTree</a>&nbsp;to replace SnapTreeMap and removing the&nbsp;<a href="https://issues.apache.org/jira/browse/CASSANDRA-5549">switchlock contention</a>, writes on this 32 core VM are actually bottlenecked on the (single) commitlog disk now. We confirmed this by testing with&nbsp;durable writes&nbsp;disabled, but that's not a very useful scenario for production. So we're prioritizing&nbsp;<a href="https://issues.apache.org/jira/browse/CASSANDRA-6809">commitlog compression</a>&nbsp;and support for&nbsp;<a href="https://issues.apache.org/jira/browse/CASSANDRA-7075">multiple commitlog volumes</a>&nbsp;quickly.

Final thoughts:

<ul>
	<li>CQL delivering on its promise of a substantial performance boost over Thrift. Even if you only care about performance and not the&nbsp;productivity benefits of CQL, I strongly recommend against Thrift unless you are maintaining a legacy code base.</li>
	<li>Some environments will benefit more than others from the improvements here. EC2 seems particularly happy with the new executor pool; other hardware may see different gains. &nbsp;Our two year old, 8 core test machines with 6 SATA disks saw "only"&nbsp;a 50% improvement on reads&nbsp;and&nbsp;a 60% improvement on writes.</li>
</ul>

Cassandra 2.1: now over 50% faster

Jonathan EllisTechnology

Share

Share

More Company

DataStax Acquires Langflow to Accelerate Generative AI Development

The Top 5 DataStax Stories from 2023

2023 Recap: Data = AI

DataStax Astra DB Nabs Three Prestigious 2023 TrustRadius “Best of” Awards, Dominates the Vector Databases Category

One-stop Data API for Production GenAI