Aleksey Yeschenko

Cassandra is great at&nbsp;handling entire node failures. It's not just robust, it's&nbsp;<a href="https://twitter.com/dehora/status/218911811725041665" title="">almost indestructible</a>.

But until Cassandra 1.2, a single unavailable disk has the potential to make the whole replica unresponsive, while still technically alive and part of the cluster:&nbsp;<a href="https://www.datastax.com/docs/1.1/dml/about_writes" title="">memtables</a>&nbsp;will be unable to flush and the node will eventually run out of memory. Commitlog append may also fail if you happen to lose the commitlog disk.

The traditional workaround has been to&nbsp;<a href="https://docs.datastax.com" title="">deploy on raid10 volumes</a>, but as Cassandra handles&nbsp;increasingly large data volumes&nbsp;the prospect of paying an extra 50% space penalty on top of&nbsp;<a href="https://docs.datastax.com" title="">Cassandra's own replication</a>&nbsp;is becoming unpalatable.

The upcoming Cassandra 1.2 release (currently in beta) fixes both of these issues by introducing a&nbsp;<tt>disk_failure_policy</tt>&nbsp;setting that allows you to choose from two policies that deal with disk failure sensibly:&nbsp;best_effort&nbsp;and&nbsp;stop. Here is how these work:

<ul>
	<li>stop&nbsp;is the default behavior for new 1.2 installations. Upon encountering a file system error Cassandra will shut down gossip and Thrift services, leaving the node effectively dead, but still inspectable via JMX for troubleshooting.</li>
</ul>

<ul>
	<li>best_effort&nbsp;Cassandra will do its best in the face of disk errors: if it can't write to a disk, the disk will become blacklisted for writes and the node will continue writing elsewhere; if Cassandra can't read from a disk, it will be marked as unreadable, and the node will continue serving data from readable sstables only. This implies that it's possible for stale data to be served when the most recent version was on the disk that is no longer accessible and consistency level is ONE, so&nbsp;choose this option with care. This allows you to get the most out of your disks.</li>
</ul>

An&nbsp;ignore&nbsp;policy also exists for upgrading users. In this mode Cassandra will behave in the exact same manner as 1.1 and older versions did - all file system errors will logged but otherwise ignored. DataStax recommends users opt in to&nbsp;stop&nbsp;or&nbsp;best_effort&nbsp;instead.

<h2>Summary</h2>

Starting with version 1.2, Cassandra will be able to properly react to a disk failure - either by stopping the affected node or by blacklisting the failed drive, depending on your availability/consistency requirements. This allows deploying Cassandra nodes with large disk arrays without the overhead of raid10.

Handling Disk Failures In Cassandra 1.2

Aleksey Yeschenko

Share

Share

Summary

More Technology

Knowledge Graphs for RAG without a GraphDB

How Winweb Built its AI Assistant with DataStax Astra DB and LangChain

Vercel + Astra DB: Get Data into Your GenAI Apps Fast

Simplifying Agent Development with Astra DB Connector for Vertex AI Search

One-stop Data API for Production GenAI