Anticompaction in Cassandra 2.1
Cassandra 2.1 introduces incremental repairs which makes repair a lot more lightweight as it does not do repair on already repaired data. Anticompaction is one of the things that makes incremental repairs possible. This blog posts aims to explain what anticompaction is and how it affects regular compaction.
The concept of incremental repairs is explained more in detail in this post, but the general idea is that we mark all SSTables that were involved in the repair with a timestamp to indicate that it was repaired, then we simply don't include that SSTable when we do the next repair since we know that data that was once repaired will stay that way.
Since SSTables can contain any range, we need to split out the ranges that were actually repaired, this is called anticompaction. It means that one SSTable is split in two - one containing repaired data and one containing unrepaired data.
If the range in the SSTable is fully contained within the range that was repaired, we don't actually rewrite the SSTable, instead we just change the SSTable metadata to indicate when it was repaired.
Since we now have two sets of SSTables that we can't compact together, we need to do some adjustments to our compaction strategies. Reason we can't compact them together is that we would lose the repaired status if we merged a repaired SSTable with an unrepaired.
Size tiered compaction
With size tiered compaction it is pretty simple, we split the SSTables in two sets, one with repaired and one with unrepaired. Then we try to find compaction candidates within those two sets and run the compaction on the candidates that would have the biggest benefit.
This means major compaction will now create two SSTables instead of one. If major compaction was being used to help clear out tombstones, it should still work just as well as before. We can drop a tombstone if the SSTables included in the compaction contain the tombstone plus all older occurrences of the key. Since the unrepaired set of SSTables will almost always only include newer data, it shouldn't prevent dropping the tombstone as the tombstone wouldn't cover any data there.
For leveled compaction we do leveling on the repaired SSTables and then size tiered compaction on the unrepaired ones. This means that once you do an incremental repair you will have to continue doing them (there are ways to clear out the repair-state to revert this, more about that later). Otherwise you will not run leveled compaction, just size tiered.
The complicated part is how to migrate to using incremental repairs. Since we only want to separate repaired and unrepaired SSTables after the first incremental repair has been run and before that we want a leveling on all of the SSTables.
After the incremental repair is done, we iterate over the SSTables included in the repair and run anticompaction on them one at a time. This means that after the first SSTable has been anticompacted, we will have to move all the currently leveled but unrepaired SSTables into the unrepaired set and end up with only the first repaired and anticompacted SSTable in the leveling and possibly thousands in the unrepaired set. After that we continue and anticompact the rest of the SSTables which were included in the repair.
We do a few things to make it better though. First, when we clear out the unrepaired SSTables from the leveling we keep the original SSTable level to make it possible to re-add the SSTable at its original position after it has been anticompacted. For example, if an SSTable is in level 3 before anticompaction, it is likely that we can add it in level 3 after the anticompaction. This is especially important as we anticompact one SSTable at a time during an anticompaction session, meaning many SSTables will just temporarily go into the unrepaired set because they might have just been repaired, just not yet anticompacted.
Running the first incremental repair will affect many nodes at the same time, to avoid that there is a way to migrate one node at a time, though it requires a bit of manual labour;
- Disable compaction on the node (nodetool disableautocompaction)
- Run a classic full repair
- Stop the node
- Use the tool sstablerepairedset to mark all the SSTables that were created before you did step 1.
- Restart cassandra
If you run regular repairs you could note when you last ran a full repair on the node and use that time. SSTables are immutable, meaning if an SSTable has not changed since the repair started, it is still repaired. Note that you need to check when you last ran a full repair (not -pr) and you will need to do it on every node.
If you want to stop using incremental repairs and are running leveled compaction, this tool can be used to clear out the repaired-state on an SSTable. Stop the node run the command tools/bin/sstablerepairedset --is-unrepaired <sstable> on all SSTables and restart, now all your data will be leveled again.