Jonathan Ellis

A common misconception is that masterless databases like Cassandra are designed to tolerate network partitions, which are&nbsp;<a href="https://twitter.com/kellabyte/status/391076376955998208">quite possible</a>&nbsp;but relatively uncomon within a single data center compared to other failures.

A more complete understanding is that Cassandra is designed to tolerate&nbsp;any&nbsp;failure without interruption, whether due to node failures, garbage collection pauses, or slow disks,&nbsp;<a href="http://pl.atyp.us/wordpress/index.php/2009/11/availability-and-partition-tolerance/">all of which are indistinguishable from partitions to the rest of the system</a>.
<img alt="Cassandra Fault Tolerance" data-align="center" data-entity-type="file" data-entity-uuid="0d062e59-7b49-40a1-ad33-ec38a85784c9" src="https://www.datastax.com/sites/default/files/inline-images/rrp-700x399.png" />
Cassandra fault tolerance. Click for details

An even more complete understanding is that designing to accomodate failure also drives latency lower: since there are no masters and any replica can answer requests, Cassandra has no artificial bottleneck that makes latency spike whenever there is a hiccup.

HBase may be the most well-known master-oriented distributed database competing with Cassandra. While HBase developer Enis Soztutar reports that "<a href="https://issues.apache.org/jira/browse/HBASE-10070">[i]n the present HBase architecture, it is hard, probably impossible, to satisfy constraints like 99th percentile of the reads will be served under 10 ms,</a>"&nbsp;Cassandra easily achieves this in many clusters.

Here's one example, from a production cluster in England (in microseconds):
<img alt="Production Cluster Example" data-align="center" data-entity-type="file" data-entity-uuid="1bbaf377-5940-4f6f-ad78-b2edf60ff4ea" src="https://www.datastax.com/sites/default/files/inline-images/99th.png" />
Part of the explanation is that Cassandra is simply&nbsp;<a href="http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf">faster overall</a>. But more broadly, Cassandra's fault tolerant design is what allows the slowest 1% of requests to be less than twice as slow as the average.

Cassandra vs failures, latency edition

Jonathan EllisTechnology

Share

Share

More Technology

Knowledge Graphs for RAG without a GraphDB

How Winweb Built its AI Assistant with DataStax Astra DB and LangChain

Vercel + Astra DB: Get Data into Your GenAI Apps Fast

Simplifying Agent Development with Astra DB Connector for Vertex AI Search

One-stop Data API for Production GenAI