DataStax Developer Blog

Tombstone removal improvement in 1.2

By Yuki Morishita -  July 2, 2012 | 0 Comments

In the post “When to use Leveled Compaction”, there is mention about “more frequent tombstone removal” in upcoming 1.2 release, so today I will explain the motivation, behind the scene of this improvement.

Motivation

Size tiered compaction gathers SSTables of similar size and compacts them together into one big SSTable. Because of its characteristics, larger SSTables are hard to find compaction peers and tend to be left untouched over time. If a lot of TTLed columns and tombstones are contained inside those large SSTables, then the chance of removing tombstones from disk during periodic compaction is low, and you have to manually perform user defined compaction in order to recover disk space.

Tombstones(marked as gray) in large SSTable are left over time

More frequent tombstone removal

What if we know how many tombstones can be removed at certain time? If we do, we can compact large SSTable alone to remove those tombstones automatically. That’s what CASSANDRA–3442 is about. From version 1.2, Cassandra tracks tombstone droppable time for all TTLed/deleted columns and performs standalone compaction onto an SSTable that has droppable tombstones ratio against all columns above certain threshold. The threshold has default value of 20% or 0.2, and you can configure threshold by providing compaction parameter tombstone_threshold when creating column family.

Behind the scene

Each SSTable carries statistics like histogram of row size and column count for written data inside Stats.db file. From version 1.2, when writing an SSTable to disk, Cassandra tracks tombstone removable time for all TTLed/deleted columns and store them to Stats.db. We cannot store all those information because number of columns sometimes goes up to billions. So instead, Cassandra constructs histogram from streaming of columns using algorithm described in this paper. The resulting histogram looks like bellow.

Tombstone drop time histogram

Here, horizontal axis represents buckets of unix timestamp at which tombstones are droppable, and vertical axis shows number of droppable tombstones at that time range. Using this histogram combined with total number of columns, we can estimate the ratio of tombstones that are droppable at the time of compaction.

Wrap up

In version 1.2, Cassandra performs standalone compaction based on droppable tombstones ratio of an SSTable. With this improvements, tombstones are more frequently removed from disk. The work on leveled compaction strategy is still under development, but is expected also in 1.2.



Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>