Tombstone removal improvement in 1.2
In the post “When to use Leveled Compaction”, there is mention about “more frequent tombstone removal” in upcoming 1.2 release, so today I will explain the motivation, behind the scene of this improvement.
Size tiered compaction gathers SSTables of similar size and compacts them together into one big SSTable. Because of its characteristics, larger SSTables are hard to find compaction peers and tend to be left untouched over time. If a lot of TTLed columns and tombstones are contained inside those large SSTables, then the chance of removing tombstones from disk during periodic compaction is low, and you have to manually perform user defined compaction in order to recover disk space.
More frequent tombstone removal
What if we know how many tombstones can be removed at certain time? If we do, we can compact large SSTable alone to remove those tombstones automatically. That’s what CASSANDRA–3442 is about. From version 1.2, Cassandra tracks tombstone droppable time for all TTLed/deleted columns and performs standalone compaction onto an SSTable that has droppable tombstones ratio against all columns above certain threshold. The threshold has default value of 20% or 0.2, and you can configure threshold by providing compaction parameter tombstone_threshold when creating column family.
Behind the scene
Each SSTable carries statistics like histogram of row size and column count for written data inside Stats.db file. From version 1.2, when writing an SSTable to disk, Cassandra tracks tombstone removable time for all TTLed/deleted columns and store them to Stats.db. We cannot store all those information because number of columns sometimes goes up to billions. So instead, Cassandra constructs histogram from streaming of columns using algorithm described in this paper. The resulting histogram looks like bellow.
Here, horizontal axis represents buckets of unix timestamp at which tombstones are droppable, and vertical axis shows number of droppable tombstones at that time range. Using this histogram combined with total number of columns, we can estimate the ratio of tombstones that are droppable at the time of compaction.
In version 1.2, Cassandra performs standalone compaction based on droppable tombstones ratio of an SSTable. With this improvements, tombstones are more frequently removed from disk. The work on leveled compaction strategy is still under development, but is expected also in 1.2.