Jonathan Ellis

<p>Rapid read protection allows Cassandra to tolerate node failure without dropping a single request. We designed it for 2.0, but it&nbsp;<a href="https://issues.apache.org/jira/browse/CASSANDRA-5932">took some extra time</a>&nbsp;to get the corner cases worked out. It's finished now for the upcoming Cassandra 2.0.2 release.</p>

<p>It's easier to explain how it works after showing what it does. Here's a graph of a small four-node Cassandra cluster with three replicas being stress-tested with five different rapid read protection settings, and one with none at all:</p>
<img alt="Node death" data-align="center" data-entity-type="file" data-entity-uuid="6d45a1ef-7d74-4093-bd38-74d34914b404" src="https://www.datastax.com/sites/default/files/inline-images/5932-node-death.png" />
<p>The dip is where we killed one of the nodes with prejudice. With rapid read protection disabled, traffic comes to a standstill until failure detection takes the dead node out of service for client requests.</p>

<h3>Why is there a disruption without rapid read protection?</h3>

<p>Cassandra performs only as many requests as necessary to meet the requested ConsistencyLevel: one request for ConsitencyLevel.ONE, two for ConsistencyLevel.QUORUM (with three replicas), and so forth. Cassandra uses the&nbsp;<a href="https://www.datastax.com/dev/blog/dynamic-snitching-in-cassandra-past-present-and-future">dynamic snitch</a>&nbsp;to route requests to the most-responsive replica.</p>

<p>In this diagram, we see the client asking a Cassandra node for some data (1). This node, the request&nbsp;<em>coordinator</em>, then routes it to the best-performing replica (2), then relays the response back to the client (3, 4).</p>
<img alt="Client asks cassandra node for data" data-align="center" data-entity-type="file" data-entity-uuid="c28ac501-cb4d-41a7-ad9a-1490b4e3243e" src="https://www.datastax.com/sites/default/files/inline-images/Screen-Shot-2013-10-03-at-11.06.36-PM.png" />
<p>This gives Cassandra maximum throughput, but at the at the cost of some fragility: if the replica to which the request is routed fails before responding, the request will time out:</p>
<img alt="request times out" data-align="center" data-entity-type="file" data-entity-uuid="d16ca4d0-81af-4e8e-8494-cdcf182e6403" src="https://www.datastax.com/sites/default/files/inline-images/Screen-Shot-2013-10-03-at-11.08.28-PM.png" />
<p>Rapid read protection allows the coordinator to monitor the outstanding requests and send redundant requests to other replicas when the original is slower than expected:</p>
<img alt="replica dies" data-align="center" data-entity-type="file" data-entity-uuid="d62cf37b-49a4-40ce-a557-59cada7407d1" src="https://www.datastax.com/sites/default/files/inline-images/Screen-Shot-2013-10-03-at-11.09.43-PM.png" />
<h3>Configuring rapid read protection</h3>

<p>Rapid read protection can be configured to do retry after a fixed period of milliseconds or after a percentile of the typical read latency (tracked per table). For example,</p>

<pre>
ALTER TABLE users WITH speculative_retry = '10ms';</pre>

<p>Or,</p>

<pre>
ALTER TABLE users WITH speculative_retry = '99percentile';</pre>

<p>(For those familiar with Hadoop, this is similar to&nbsp;<a href="https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/task-execution">speculative execution</a>, applied to much shorter request latencies. &nbsp;It is the same idea as Jeff Dean's&nbsp;<a href="http://cacm.acm.org/magazines/2013/2/160173-the-tail-at-scale/fulltext">hedged requests</a>, which with all due respect is an even worse name. &nbsp;<a href="http://www.quora.com/Computer-Science/Why-is-naming-things-hard-in-computer-science-and-how-can-it-can-be-made-easier">There are two hard problems in Computer Science...</a>)</p>

<p>By default, 2.0.2 will use 99th percentile. This is a good balance between not performing a lot of extra requests (only 1% more than with no protection at all) while still dealing with the worst problems. As you can see above, extra requests are not free; the more prolific retry settings of 75% and&nbsp;<tt>ALWAYS</tt>&nbsp;have noticeably lower throughput.</p>

<p>90th percentile can also be a reasonable setting to be more aggressive about reducing latency (see below) while still having a relatively small impact on throughput.</p>

<h3>Reducing latency variance with rapid read protection</h3>

<p>Rapid read protection also helps reduce latency variance in the face of less drastic events than complete node failure. Here's a scenario where we start a full, un-<a href="https://www.datastax.com/dev/blog/six-mid-series-changes-to-know-about-in-1-2-x">throttled</a>&nbsp;compaction across the cluster. Throughput doesn't improve dramatically since all replicas are equally affected, but notice how much better 99.9th% latency is with with rapid read protection:</p>
<img alt="Node compaction" data-align="center" data-entity-type="file" data-entity-uuid="cf9b358d-7b5a-4760-b894-f9e4494b041d" src="https://www.datastax.com/sites/default/files/inline-images/5932-compaction1.png" />
<h3>Some more subtle points</h3>

<ul>
	<li>Rapid read protection does not help at all with&nbsp;<tt>ConsistencyLevel.ALL</tt>&nbsp;reads, since there are no "other replicas;" responses from all replicas are required by definition.</li>
	<li>When the node is killed in the first graph, throughput still dips with read protection. This is because in our small four node cluster, we've lost 25% of our capacity and have to redo those requests almost all at once, causing a brief load spike on the surviving nodes. The larger your cluster is, the smaller the impact will be. (Another benefit to spreading replication throughout the cluster with virtual nodes.)</li>
	<li>If you look closely, you'll see that the throughput in the first graph actually recovers to a&nbsp;<em>higher</em>&nbsp;level than initially. This is because we've&nbsp;<a href="https://issues.apache.org/jira/browse/CASSANDRA-5932?focusedCommentId=13780003&amp;page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13780003">forced random replica choice</a>&nbsp;to eliminate the dynamic snitch's influence here, so we go from about 1/4 of reads being satisfied locally on the coordinator node to about 1/3.</li>
	<li>In the last graph, latency variance is actually higher for the&nbsp;<tt>ALWAYS</tt>&nbsp;setting. This is because so many extra reads has pushed us up against our cluster's capacity ceiling.&nbsp;<tt>ALWAYS</tt>&nbsp;is only recommended if you're sure you'll have the capacity to spare!</li>
</ul>


Rapid read protection in Cassandra 2.0.2

Jonathan EllisTechnology

Share

Share

Why is there a disruption without rapid read protection?

Configuring rapid read protection

Reducing latency variance with rapid read protection

Some more subtle points

More Technology

How to Build a Crystal Image Search App with Vector Search

Knowledge Graphs for RAG without a GraphDB

How Winweb Built its AI Assistant with DataStax Astra DB and LangChain

Vercel + Astra DB: Get Data into Your GenAI Apps Fast

One-stop Data API for Production GenAI