This section discusses routine management and maintenance tasks.
The nodetool repair command repairs inconsistencies across all of the replicas for a given range of data. Repair should be run at regular intervals during normal operations, as well as during node recovery scenarios, such as bringing a node back into the cluster after a failure.
Unless Cassandra applications perform no deletes at all, production clusters require periodic, scheduled repairs on all nodes. The hard requirement for repair frequency is the value of gc_grace_seconds. Make sure you run a repair operation at least once on each node within this time period. Following this important guideline ensures that deletes are properly handled in the cluster.
Note
Repair requires heavy disk and CPU consumption. Use caution when running node repair on more than one node at a time. Be sure to schedule regular repair operations for low-usage hours.
In systems that seldom delete or overwrite data, it is possible to raise the value of gc_grace_seconds at a minimal cost in extra disk space used. This allows wider intervals for scheduling repair operations with the nodetool utility.
Cassandra allows you to add capacity to a cluster by introducing new nodes to the cluster in stages. When a new node joins an existing cluster, it needs to know:
You set the Node and Cluster Initialization Properties in cassandra.yaml file.
When you add a node to a cluster, it needs to know its position in the ring. There are a few different approaches for calculating tokens for new nodes:
Add capacity by doubling the cluster size. Adding capacity by doubling (or tripling or quadrupling) the number of nodes is operationally less complicated when assigning tokens. Existing nodes can keep their existing token assignments, and new nodes are assigned tokens that bisect (or trisect) the existing token ranges. For example, when you generate tokens for 6 nodes, three of the generated token values will be the same as if you generated for 3 nodes. You just need to determine the token values that are already in use, and assign the newly calculated token values to the newly added nodes.
Recalculate new tokens for all nodes and move nodes around the ring. If doubling the cluster size is not feasible, and you need to increase capacity by a non-uniform number of nodes, you will have to recalculate tokens for the entire cluster. Existing nodes will have to have their new tokens assigned using nodetool move. After all nodes have been restarted with their new token assignments, run a nodetool cleanup in order to remove unused keys on all nodes. These operations are resource intensive and should be planned for low-usage times.
Add one node at a time and leave the initial_token property empty. When the initial_token is empty, Cassandra splits the token range of the heaviest loaded node and places the new node into the ring at that position. Note that this approach will probably not result in a perfectly balanced ring, but it will alleviate hot spots.
Note
If you have DataStax OpsCenter Enterprise Edition, you can quickly add nodes to the cluster using this approach and then use its rebalance feature to automatically calculate balanced token ranges, move tokens accordingly, and then perform cleanup on the nodes after the moves are complete.
Note
The rebalance feature in DataStax OpsCenter Enterprise Edition automatically calculates balanced token ranges and perform steps 6 and 7 on each node in the cluster in the correct order.
Increasing the replication factor increases the total number of copies of keyspace data stored in a Cassandra cluster.
Update each keyspace in the cluster and change its replication strategy options. For example, to update the number of replicas in Cassandra CLI when using SimpleStrategy replica placement strategy:
[default@unknown] UPDATE KEYSPACE demo
WITH strategy_options = [{replication_factor:3}];
Or if using NetworkTopologyStrategy:
[default@unknown] UPDATE KEYSPACE demo
WITH strategy_options = [{datacenter1:6,datacenter2:6}];
On each node in the cluster, run nodetool repair for each keyspace that was updated. Wait until repair completes on a node before moving to the next node.
To replace a node that has died (due to hardware failure, for example), you can bring up a new node in its place by starting the new node with the -Dcassandra.replace_token=<token> parameter and having the new node assume the token position of the node that has died. To replace a dead node in this way:
To replace a dead node:
Confirm the dead node using the nodetool ring command on any live node in the cluster (note the Down status and the token value of the dead node). For example:
Prepare the replacement node by installing Cassandra and correctly configuring its cassandra.yaml file.
Start Cassandra on the new node using the startup property -Dcassandra.replace_token=<token> and pass in the same token that was used by the dead node. For example:
$ cassandra -Dcassandra.replace_token=28356863910078205288614550619314017621
The new node will start in a hibernate state and begin to bootstrap data from its associated replica nodes. During this time, the node will not accept writes and is seen as down to other nodes in the cluster. When the bootstrap is complete, the node will be marked as up and any missed writes that occurred during bootstrap will be replayed using hinted handoff.
Once the new node is up, it is strongly recommended to run nodetool repair on each keyspace to ensure the node is fully consistent. For example:
$ nodetool repair -h 10.46.123.12 keyspace_name -pr