Tombstone removal improvement in 1.2

By Yuki Morishita -  July 2, 2012 | 0 Comments

In the post “When to use Leveled Compaction”, there is mention about “more frequent tombstone removal” in upcoming 1.2 release, so today I will explain the motivation, behind the scene of this improvement.


Size tiered compaction gathers SSTables of similar size and compacts them together into one big SSTable. Because of its characteristics, larger SSTables are hard to find compaction peers and tend to be left untouched over time. If a lot of TTLed columns and tombstones are contained inside those large SSTables, then the chance of removing tombstones from disk during periodic compaction is low, and you have to manually perform user defined compaction in order to recover disk space.

Tombstones(marked as gray) in large SSTable are left over time

More frequent tombstone removal

What if we know how many tombstones can be removed at certain time? If we do, we can compact large SSTable alone to remove those tombstones automatically. That's what CASSANDRA–3442 is about. From version 1.2, Cassandra tracks tombstone droppable time for all TTLed/deleted columns and performs standalone compaction onto an SSTable that has droppable tombstones ratio against all columns above certain threshold. The threshold has default value of 20% or 0.2, and you can configure threshold by providing compaction parameter tombstone_threshold when creating column family.

Behind the scene

Each SSTable carries statistics like histogram of row size and column count for written data inside Stats.db file. From version 1.2, when writing an SSTable to disk, Cassandra tracks tombstone removable time for all TTLed/deleted columns and store them to Stats.db. We cannot store all those information because number of columns sometimes goes up to billions. So instead, Cassandra constructs histogram from streaming of columns using algorithm described in this paper. The resulting histogram looks like bellow.

Tombstone drop time histogram

Here, horizontal axis represents buckets of unix timestamp at which tombstones are droppable, and vertical axis shows number of droppable tombstones at that time range. Using this histogram combined with total number of columns, we can estimate the ratio of tombstones that are droppable at the time of compaction.

Wrap up

In version 1.2, Cassandra performs standalone compaction based on droppable tombstones ratio of an SSTable. With this improvements, tombstones are more frequently removed from disk. The work on leveled compaction strategy is still under development, but is expected also in 1.2.

DataStax has many ways for you to advance in your career and knowledge.

You can take free classes, get certified, or read one of our many white papers.

register for classes

get certified

DBA's Guide to NoSQL


Your email address will not be published. Required fields are marked *

Subscribe for newsletter:

Tel. +1 (408) 933-3120 Offices France GermanyJapan

DataStax Enterprise is powered by the best distribution of Apache Cassandra™.

© 2017 DataStax, All Rights Reserved. DataStax, Titan, and TitanDB are registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache Cassandra, Apache, Tomcat, Lucene, Solr, Hadoop, Spark, TinkerPop, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.