This section discusses routine management and maintenance tasks.
The nodetool repair command repairs inconsistencies across all of the replicas for a given range of data. Repair should be run at regular intervals during normal operations, as well as during node recovery scenarios (bringing a node back into the cluster after a failure).
Unless Cassandra applications perform no deletes at all, production clusters must schedule repair to run periodically on all nodes. The hard requirement for repair frequency is the value used for gc_grace_seconds – make sure you run a repair operation at least once on each node within this time period. Following this important guideline can ensure that deletes are properly handled in the cluster.
Note
Repair is an expensive operation in both disk and CPU consumption. Use caution when running node repair on more than one node at a time, and schedule regular repair operations for low-usage hours.
In systems that seldom delete or overwrite data, it is possible to raise the value of gc_grace_seconds at a minimal cost in extra disk space used. This allows wider intervals for scheduling repair operations with the nodetool utility.
Cassandra allows you to add capacity to a cluster by introducing new nodes to the cluster in stages. When a new node joins an existing cluster, it needs to know:
When you add a node to a cluster, it needs to know its position in the ring. There are a few different approaches for calculating tokens for new nodes:
To calculate tokens for expansion:
vi move_tokens.py
#! /usr/bin/python import sys if len(sys.argv) != 3: print "usage: move_tokens.py <current_size> <new_size>" sys.exit(1) CURRENT_NODES = int(sys.argv[1]) GOING_TO_NODES = int(sys.argv[2]) RING_SIZE = 2**127 def tokens(n): rv = [] for x in xrange(n): rv.append(RING_SIZE / n * x) return rv def print_layout(nodes): i = 0 for n in nodes: i += 1 print "%d:\t%d" % (i, n) current = tokens(CURRENT_NODES) go = tokens(GOING_TO_NODES) print "Existing Tokens: " print_layout(current) print "" print "" print "New Tokens: " print_layout(go) print "" print "" i = 0 j = 0 pending = [] while j != len(go) and i != len(current): cur = current[i] next = go[j] if next == cur: if pending: for x in pending: print x pending = [] print "[%d] Old Node %d stays at %d" % (j+1, i+1, cur) i += 1 if next > cur: if pending: for x in pending: print x pending = [] #diff = (next - cur) print "[%d] Old Node %d moves to %d" % (j+1, i+1, next) i += 1 if next < cur: pending.insert(0, "[%d] New Node added at %d" % (j+1, next)) j += 1 if pending: for x in pending: print x pending = [] while j != len(go): print "[%d] Add new node %d at %d" % (j+1, j+1, go[j]) j += 1
Make it executable:
chmod +x move_tokens.py
Run the program and supply the number of nodes in your current cluster and the number of nodes in your expanded cluster. For example, if expanding from a 6 node cluster to a 9 node cluster:
./move_tokens.py 6 9 [1] Old Node 1 stays at 0 [2] New Node added at 18904575940052136859076367079542678414 [3] Old Node 2 moves to 37809151880104273718152734159085356828 [4] Old Node 3 stays at 56713727820156410577229101238628035242 [5] New Node added at 75618303760208547436305468318170713656 [6] Old Node 4 moves to 94522879700260684295381835397713392070 [7] Old Node 5 stays at 113427455640312821154458202477256070484 [8] New Node added at 132332031580364958013534569556798748898 [9] Old Node 6 moves to 151236607520417094872610936636341427312
Based on the the above example output, first you would install and initialize the new nodes at token positions 2, 8 and 5. Then you would run nodetool move for token positions 3, 6 and 9. Positions 1, 4, and 7 can keep their original token assignments. After all nodes are up with their new tokens, run nodetool cleanup on each node in the cluster.
Increasing the replication factor increases the total number of copies of keyspace data stored in a Cassandra cluster.
Update each keyspace in the cluster and change its replication strategy options. For example to update the number of replicas in Cassandra CLI when using SimpleStrategy replica placement strategy:
[default@unknown] UPDATE KEYSPACE demo WITH strategy_options = [{replication_factor:3}];
Or if using NetworkTopologyStrategy:
[default@unknown] UPDATE KEYSPACE demo WITH strategy_options = [{datacenter1:6,datacenter2:6}];
On each node in the cluster, run nodetool repair for each keyspace that was updated. Wait until repair completes on a node before moving to the next node.
Node failures can be handled by bringing up a replacement node in one of two ways: autobootstrap with a new IP Address or use the same IP Address and token as the previous host and nodetool repair. Which of these processes to choose depends on your client’s consistency levels and your tolerance for stale or missing data. In discussing these two approaches, we will use the nodes in the following ring as an example:
$ nodetool -h localhost -p 8080 ring
Address Status State Load Owns Range Ring
95315431979199388464207182617231204396
10.194.171.160 Down Normal ? 39.98 61078635599166706937511052402724559481 |<--|
10.196.14.48 Up Normal 3.16 KB 30.01 78197033789183047700859117509977881938 | |
10.196.14.239 Up Normal 3.16 KB 30.01 95315431979199388464207182617231204396 |-->|
During the autobootstrap process the node will not receive reads until bootstrapping is complete. Further, the ring will go down to two nodes when the dead node is removed:
$ nodetool -h localhost -p 8080 ring
Address Status State Load Owns Range Ring
95315431979199388464207182617231204396
10.196.14.48 Up Normal 3.16 KB 30.01 78197033789183047700859117509977881938 |<--|
10.196.14.239 Up Normal 3.16 KB 30.01 95315431979199388464207182617231204396 |-->|
Once the node has completed bootstrapping, the ring will again show three nodes:
$ nodetool -h localhost -p 8080 ring
Address Status State Load Owns Range Ring
95315431979199388464207182617231204396
10.196.14.48 Up Normal 3.16 KB 30.01 78197033789183047700859117509977881938 |<--|
10.194.171.160Up Normal 495 bytes 01.75 86756232884191218082533150063604543167 | |
10.196.14.239 Up Normal 3.16 KB 30.01 95315431979199388464207182617231204396 |-->|
This is the safest approach if the client read consistency level is frequently ONE. The downside of this approach is that we have relied on the cluster to auto-balance the load and our replacement node now has a different token.
If you do more QUORUM reads, you can tolerate empty results being returned, or you want to maintain the token assignments on the ring, this is the preferred approach. The following ring will be used in this example (the same status from the first example):
$ nodetool -h localhost -p 8080 ring
Address Status State Load Owns Range Ring
95315431979199388464207182617231204396
10.194.171.160 Down Normal ? 39.98 61078635599166706937511052402724559481 |<--|
10.196.14.48 Up Normal 3.16 KB 30.01 78197033789183047700859117509977881938 | |
10.196.14.239 Up Normal 3.16 KB 30.01 95315431979199388464207182617231204396 |-->|
First, pick a token for your new node which is -1 from the token of the dead node and start the new node. Given the above ring, our value for initial_token will be:
86756232884191218082533150063604543166
Once autoboostrap is complete, the ring should now have four nodes, three active and one down:
$ nodetool -h localhost -p 8080 ring
Address Status State Load Owns Range Ring
95315431979199388464207182617231204396
10.196.14.48 Up Normal 3.16 KB 30.01 78197033789183047700859117509977881938 |<--|
10.203.30.139 Up Normal 495 bytes 01.75 86756232884191218082533150063604543166 | |
10.194.171.160Down Normal ? 01.75 86756232884191218082533150063604543167 | |
10.196.14.239 Up Normal 3.16 KB 30.01 95315431979199388464207182617231204396 |-->|
Remove the dead node from the ring with nodetool removetoken using the down node’s token:
$ nodetool -h 10.196.14.239 -p 8080 removetoken 86756232884191218082533150063604543167
Verify the integrity of the ring:
$ nodetool -h 10.196.14.239 -p 8080 ring
Address Status State Load Owns Range Ring
95315431979199388464207182617231204396
10.196.14.48 Up Normal 3.16 KB 30.01 78197033789183047700859117509977881938 |<--|
10.203.30.139 Up Normal 495 bytes 01.75 86756232884191218082533150063604543166 | |
10.196.14.239 Up Normal 3.16 KB 30.01 95315431979199388464207182617231204396 |-->|
Last, run nodetool repair for each keyspace against the next node on the ring:
$ nodetool -h 10.196.14.239 -p 8080 repair Keyspace1