Apache Cassandra™ 1.2

About deletes

Cassandra deletes data in a different way from a traditional, relational database. A relational database might spend time scanning through data looking for expired data and throwing it away or an administrator might have to partition expired data by month, for example, to clear it out faster. In Cassandra, you do not have to manually remove expired data. Two facts about deleted Cassandra data to keep in mind are:
  • Cassandra does not immediately remove deleted data from disk.
  • A deleted column can reappear if you do not run node repair routinely.

After an SSTable is written, it is immutable (the file is not updated by further DML operations). Consequently, a deleted column is not removed immediately. Instead a tombstone is written to indicate the new column status. Columns marked with a tombstone exist for a configured time period (defined by the gc_grace_seconds value set on the table). When the grace period expires, the compaction process permanently deletes the column.

Marking a deleted column with a tombstone signals Cassandra to retry sending a delete request to a replica that was down at the time of delete. If the replica comes back up within the grace period of time, it eventually receives the delete request. However, if a node is down longer than the grace period, then the node can possibly miss the delete altogether, and replicate deleted data once it comes back up again. To prevent deleted data from reappearing, administrators must run regular node repair on every node in the cluster (by default, every 10 days).