Cassandra provides automated compaction and repair processes to continously optimize disk usage and maintain data consistency. Additionally, there are manual compaction and repair operations that you can launch with the nodetool utility.
For backing up data, Cassandra provides a process of restoring data using snapshots together with incremental backups.
Regularly scheduled, per-node repair operations can be especially important for the health of a Cassandra system that frequently over-writes or deletes data. See Anti-Entropy Repair with nodetool below.
Cassandra repairs missing or inconsistent data in one of two ways: through read repair, and through the nodetool repair command. Read repair runs automatically at runtime in response to read requests, performing a constant check on heavily-used data. Nodetool repair is a manual process that you can run on nodes through the command line.
When a particular key is requested by a read operation, read repair is automatically run to update any inconsistent or stale values for that key.
If the consistency level of the request is higher than ONE (QUORUM, LOCAL_QUORUM, EACH_QUORUM or ALL), read repair is performed before returning a value for the triggering request.
When the read operation has a consistency level of ONE, read repair may run in the background while a potentially inconsistent value is returned. You can disable or tune the frequency of this background read repair by setting the parameter read_repair_chance in cassandra.yaml (by default, read repair is always performed).
The nodetool repair command uses a process called Anti-Entropy to build a Merkle tree from the data on the target node which will be compared with other nodes to discover and repair inconsistencies. Running this process is essential both when nodes have suffered significant failures affecting data consistency, and also in Cassandra systems that over-write or delete data.
Unless Cassandra applications perform no deletes at all, production clusters must schedule repair to run periodically on all nodes. The hard requirement for repair frequency is the value used for gc_grace_seconds – make sure you run a repair operation at least once on each node within this time period. Following this important guideline can ensure that deletes are properly handled in the cluster.
Anti-entropy is an expensive operation in both disk and CPU consumption. Use caution when running nodetool repair on more than one node at a time, and schedule regular repair operations for low-usage hours.
In systems that seldom delete or overwrite, it is possible to raise the value of gc_grace_seconds at a minimal cost in extra disk space used. This allows wider intervals for scheduling repair operations with the nodetool utility.
To monitor the progress of a nodetool repair operation, you can watch the active task queue in CompactionManager in conjunction with the active and pending counts on AE-SERVICE-STAGE and STREAMING-STAGE
During normal operations, numerous SSTables may be created on disk for a given column family. Compaction is the process of merging multiple SSTables into one. Additionally, the compaction process merges keys, combines columns, discards tombstones and creates a new index in the merged SSTable.
Compaction in Cassandra is performed at two levels: minor compaction and major compaction. Minor compactions are performed regularly and automatically, as scheduled for each column family. Major compaction is a CPU-intensive process that you must initiate with nodetool compact on a per-keyspace basis.
The frequency and scope of minor compactions is controlled by the following column family options in cassandra.yaml:
These parameters set thresholds for the number of similar-sized SSTables that can accumulate before a minor compaction is triggered. With the default values, a minor compaction may begin any time after four SSTables are created on disk for a column family, and must begin before 32 SSTables accumulate.
You can tune these values by editing cassandra.yaml, or by using nodetool setcompactionthreshold.
To monitor compaction operations, use a JMX-compliant tool that can view the statistics exposed by the CompactionManagerMBean. These attributes are listed in the JMX pending tasks section.
A major compaction process merges all SSTables for all column families in a keyspace – not just similar sized ones, as in minor compaction. Note that this may create extremely large SStables that result in long intervals before the next minor compaction (and a resulting increase in CPU usage for each minor compaction).
Though a major compaction ultimately frees disk space used by accumulated SSTables, during runtime it can temporarily double disk space usage. It is best to run major compactions, if at all, at times of low demand on the cluster.
Cassandra can capture a snapshots of a keyspace’s data during online operations, on a global or per-node basis. Used together with configurable per-SSTable incremental backup operations, snapshots can provide a dependable backup mechanism.
By snapshotting the cluster, you can achieve an “eventually consistent backup.” Except in the atypical case where all data is written with a consistency level of “ALL,” no individual node’s snapshot is guaranteed to be consistent; but if you restore from a given snapshot, the system as a whole can resume consistent behavior.
Incremental backups complete the picture by backing up data updated since the last snapshot. With incremental backups enabled (they are disabled by default), each time an SSTable is flushed, a hard link is copied into a /backups subdirectory under your data directory.
You can create snapshots with the nodetool utility, or a single, cluster-wide snapshot using the clustertool utility. Cassandra does not automatically clear snapshots, so you must remove them periodically using these utilities.
To create and clear snapshots of a node:
Run the nodetool utility with the following command:
$ nodetool -h localhost -p 8080 snapshot *snapshot name*
You can provide an optional name that may be helpful for managing snapshots. Snapshots are saved in the Cassandra data directory, typically /var/lib/cassandra/data/*keyspace_name*/snapshots/*snapshot_name*/ ``. Each *snapshot_name* folder contains numerous ``*.db files containing the data at the time of the snapshot.
All existing snapshots for a node can be cleared with the following command:
$ nodetool -h localhost -p 8080 clearsnapshot
To create and clear cluster-wide snapshots:
Run the clustertool utility with the following command:
$ clustertool -h localhost -p 8080 global_snapshot *snapshot name*
To clear all global snapshots:
$ nodetool -h localhost -p 8080 clear_global_snapshot
When incremental backups are enabled (they are disabled by default), Cassandra hard-links each flushed SSTable to a backups directory under the keyspace data directory. This allows you to store backups offsite without transferring entire snapshots. Also, incremental backups combine with snapshots to provide a dependable, up-to-date backup mechanism.
To enable incremental backups, edit cassandra.yaml to change the value of incremental_backups to true.
As with snapshots, Cassandra does not automatically clear incremental backup files. DataStax recommends setting up a process to clear incremental backup links each time a new snapshot is created.
To restore a keyspace’s data with incremental backups, you need one snapshot for each keyspace together with the the incremental backup files created after the snapshot.
Restoring from snapshots and incremental backups will temporarily cause intensive I/O activity as Cassandra runs numerous compaction operations.
To restore from a snapshot and incremental backups: