Technology•February 27, 2014

More Efficient Repairs in 2.1

Repairs are important for every Cassandra cluster, especially when frequently deleting data. Running the nodetool repair command initiates the repair process on a specific node which in turn computes a Merkle tree for each range of data on that node. The merkle tree is a binary tree of hashes used by Cassandra for calculating the differences in datasets between nodes in a cluster. Every time a repair is carried out, the tree has to be calculated, each node that is involved in the repair has to construct its merkle tree from all the sstables it stores making the calculation very expensive. This allows for repairs to be network efficient as only targeted rows identified by the merkle tree as inconsistencies are sent across the network.

Scanning every sstable to allow for the creation of merkle trees is an expensive operation. To avoid the need for constant tree construction incremental repairs are being introduced in Cassandra 2.1. The idea is to persist already repaired data, and only calculate merkle trees for sstables that haven’t previously undergone repairs allowing the repair process to stay performant and lightweight even as datasets grow so long as repairs are run frequently.

Incremental repairs begin with the repair leader sending out a prepare message to its peers. Each node builds a merkle tree from the unrepaired sstables, which it can distinguish by the new repairedAt field in each sstable’s metadata. Once the leader receives a merkle tree from each node, it compares the trees and issues streaming requests, just as in the classic repair case. Finally, the leader issues an anticompaction command. Anticompaction is the process of segregating repaired and unrepaired ranges into separate sstables; repaired sstables are written with a new repairedAt field denoting the time of repair. Since sstable are not locked against compaction during the repair, they might get removed via compaction before the process completes. This costs us some efficiency, since they will be repaired again later, but does not harm correctness.

Compaction with incremental repairs

Maintaining separate pools of repaired and unrepaired sstables causes some extra complexity for compaction to deal with. For example, in the diagram below we repair a range covering half of the initial sstable. After repair, anticompaction splits it into a set of repaired and unrepaired sstables, at which point leveled and size-tiered compaction strategies handle segregation of the data differently.

Size-Tiered compaction takes a simple approach of splitting repaired and unrepaired sstables into separate pools, each of which is compacted independently. Leveled compaction simply performs size-tiered compaction on unrepaired data, moving into the proper levels after repair. This cuts down on write amplification compared to maintaining two leveling pools.

Migrating to incremental repairs

Full repairs remain the default, largely so Cassandra doesn't have to guess the repaired state of existing sstables. Guessing that everything is fully repaired is obviously problematic; guessing that nothing is repaired is less obviously so: LCS would start size-tiering everything, since that is what it does now with unrepaired data! To avoid this, compaction remains unchanged until incremental repair is first performed and compaction detects sstables with the repairedAt flag.

Incremental repairs can be opted into via the -inc option to nodetool repair. This is compatible with both sequential and parallel (-par) repair, e.g., bin/nodetool -par -inc <ks> <cf>. When an sstable is fully covered by a repaired range, no anticompaction will occur, it will just rewrite the repairedAt field in sstable metadata. Recovering from missing data or corrupted sstables will require a non-incremental full repair. (For more on anticompaction, see Marcus's post here.)

Effect of tools / commands on repair status

Since the sstable’s repair status is now tracked via it’s metadata, understanding how the set of tools provided with open-source Cassandra can impact this repair status becomes important.

Bulk Loading - even if repaired in a different cluster, loaded tables will be unrepaired.
Scrubbing - if scrubbing results in dropping rows, new sstables will be become unrepaired, however if no bad rows are detected, the sstable will keep its original repairedAt field.
Major compaction - STCS will combine each of its pools into a single sstable, one repaired and one not. Major compaction continues to have no effect under LCS.
Setting Repaired Status - a new tool added in 2.1 beta 2 can be found in tools/bin/sstablerepairedset that allows users to mark an sstable as repaired manually allowing for an easy migration to using incremental repairs by using the sstablerepairedset --is-repaired <sstable> command. It's important to only use this tool on repaired sstables, the status of an sstable can be checked via the /tools/bin/sstablemetadata tool by looking at the repairedAt field.

JUMP TO SECTION

More Technology

View All

How to Support Streaming in AI Applications with LangChain and Langflow

Technology • October 22, 2024

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.

Learn More

Get Started for Free

More Efficient Repairs in 2.1

Compaction with incremental repairs

Migrating to incremental repairs

Effect of tools / commands on repair status

Share

Share

Compaction with incremental repairs

Migrating to incremental repairs

Effect of tools / commands on repair status

More Technology

How to Support Streaming in AI Applications with LangChain and Langflow

Building Knowledge Graphs at Production Scale for GenAI

GenAI in Retail: How to Build a Shopping Assistant with Langflow

Apache Cassandra 5.0 and DataStax: The Benefits of Staying in Sync

One-stop Data API for Production GenAI

Subscribe to AI++