Tokens assign a range of data to a particular node within a data center.
When you start a Cassandra cluster, data is distributed across the nodes in the cluster based on the row key using a partitioner. You must assign each node in a cluster a token and that token determines the node's position in the ring and its range of data. The tokens assigned to your nodes need to be distributed throughout the entire possible range of tokens (0 to 2127 -1). Each node is responsible for the region of the ring between itself (inclusive) and its predecessor (exclusive). To illustrate using a simple example, if the range of possible tokens was 0 to 100 and you had four nodes, the tokens for your nodes should be 0, 25, 50, and 75. This approach ensures that each node is responsible for an equal range of data. When using more than one data center, each data center should be partitioned as if it were its own distinct ring.
Cassandra includes a tool for generating tokens using the maximum possible range (0 to 2127 -1) for use with the RandomPartitioner.
The available options are:
|--ringrange <RINGRANGE>||Specify a numeric maximum token value for your ring, if different from the default value of 2127 -1.|
|--graph||Displays a rendering of the generated tokens as line segments in a circle, colored according to data center.|
|Optimizes multiple cluster distribution for NetworkTopologyStrategy (default).|
|Optimizes multiple cluster distribution for the OldNetworkTopologyStrategy.|
|Run in test mode. Opens Firefox and displays an HTML file that shows various ring arrangements.|
Generate tokens for nodes in a single data center:
./tools/bin/token-generator 4 Node #1: 0 Node #2: 42535295865117307932921825928971026432 Node #3: 85070591730234615865843651857942052864 Node #4: 127605887595351923798765477786913079296
Generate tokens for multiple data centers using NetworkTopologyStrategy (default):
./tools/bin/token-generator 4 4 DC #1: Node #1: 0 Node #2: 42535295865117307932921825928971026432 Node #3: 85070591730234615865843651857942052864 Node #4: 127605887595351923798765477786913079296 DC #2: Node #1: 169417178424467235000914166253263322299 Node #2: 41811290829115311202148688466350243003 Node #3: 84346586694232619135070514395321269435 Node #4: 126881882559349927067992340324292295867
Replica placement is independent within each data center.
Generate tokens for multiple racks in a single data center:
./tools/bin/token-generator 8 DC #1: Node #1: 0 Node #2: 21267647932558653966460912964485513216 Node #3: 42535295865117307932921825928971026432 Node #4: 63802943797675961899382738893456539648 Node #5: 85070591730234615865843651857942052864 Node #6: 106338239662793269832304564822427566080 Node #7: 127605887595351923798765477786913079296 Node #8: 148873535527910577765226390751398592512
As a best practice, each rack should have the same number of nodes. This allows you to alternate the rack assignments: rack1, rack2, rack3, rack1, rack2, rack3, and so on:
When adding nodes to a cluster, you must avoid token collisions. You can do this by offsetting the token values, which allows room for the new nodes.
The following graphic shows an example using an offset of +100:
It is more important that the nodes within each data center manage an equal amount of data than the distribution of the nodes within the cluster. See balancing the load.