Apache Cassandra 1.0 Documentation

Generating Tokens

This document corresponds to an earlier product version. Make sure you are using the version that corresponds to your version.

Latest Cassandra documentation | Earlier Cassandra documentation

Tokens assign a range of data to a particular node within a data center.

When you start a Cassandra cluster, data is distributed across the nodes in the cluster based on the row key using a partitioner. You must assign each node in a cluster a token and that token determines the node's position in the ring and its range of data. The tokens assigned to your nodes need to be distributed throughout the entire possible range of tokens (0 to 2 127 -1). Each node is responsible for the region of the ring between itself (inclusive) and its predecessor (exclusive). To illustrate using a simple example, if the range of possible tokens was 0 to 100 and you had four nodes, the tokens for your nodes should be 0, 25, 50, and 75. This approach ensures that each node is responsible for an equal range of data. When using more than one data center, each data center should be partitioned as if it were its own distinct ring.

Note

Each node in the cluster must be assigned a token before it is started for the first time. The token is set with the initial_token property in the cassandra.yaml configuration file.

Token Generating Tool

DataStax provides a Python program for generating tokens using the maximum possible range (0 to 2 127 -1).

To set up the Token Generating Tool:

  1. Using a text editor, create a new file named tokengentool for your token generator program.

  2. Go to https://raw.github.com/riptano/ComboAMI/2.2/tokentoolv2.py.

  3. Copy and paste the program into the tokengentool file.

  4. Save and close the file.

  5. Make it executable:

    chmod +x tokengentool
    
  6. Run the program:

    ./tokengentool <nodes_in_dc1> <nodes_in_dc2> ...
    

    The Token Generating Tool calculates the token values.

  7. Enter the corresponding value for each node in the initial_token property of the node's cassandra.yaml file.

Calculating Tokens for a Single Data Center

For a single data center using the RandomPartitioner, enter the number of nodes in Token Generating Tool. For example, for 6 nodes in a single data center, you enter:

./tokengentool 6

The tool displays the token for each node:

{
  "0": {
        "0": 0,
        "1": 28356863910078205288614550619314017621,
        "2": 56713727820156410577229101238628035242,
        "3": 85070591730234615865843651857942052864,
        "4": 113427455640312821154458202477256070485,
        "5": 141784319550391026443072753096570088106
        }
}

Calculating Tokens for Multiple Racks in a Single Data Center

If you have multiple racks in single data center, enter the number of nodes in the Token Generating Tool. As a best practice, each rack should have the same number of nodes so you can alternate the rack assignments, for example: rack1, rack2, rack3, rack1, rack2, rack3, and so on.

./tokengentool 8

The tool displays the token for each node. The graphic shows the rack assignments:


../../_images/multirack_tokens.png

Calculating Tokens for a Multiple Data Center Cluster

In multiple data center deployments, use NetworkTopologyStrategy for replica placement. This strategy determines replica placement independently within each data center. For more detailed information, see NetworkTopologyStrategy.

You can use when different methods for calculating the tokens in multiple data center clusters. The important point is that the nodes within each data center manage an equal amount of data. The distribution of the nodes within the cluster is not as important. Two manual methods are recommended:

  • Alternate token assignments. This method works best with data centers that have equal numbers of nodes in each data center.
  • Offset token values. This method works with data centers that have different number of nodes in each data center (and data centers of the same size).

Alternating Token Assignments

Calculate tokens for each data center using the Token Generating Tool and then alternate the token assignments so that the nodes for each data center are evenly dispersed around the ring. In the tool, enter the number of nodes for each data center.

./tokengentool 3 3

The tool displays the token for each node in each data center:

{
  "0": {
      "0": 0,
      "1": 56713727820156410577229101238628035242,
      "2": 113427455640312821154458202477256070485
       },
  "1": {
      "0": 28356863910078205288614550619314017621,
      "1": 85070591730234615865843651857942052863,
      "2": 141784319550391026443072753096570088106
     }
}

The following image shows the token position and data center assignments:


../../_images/multidc_alternate_tokens.png

Offsetting Token Assignments

To avoid token collisions offset the values use an offset of +100; this allows room to replace a dead node.

The following graphic shows a cluster with two 3 node data centers and one 2 node data center:

./tokengentool 3

   {
    "0": {
        "0": 0,
        "1": 56713727820156410577229101238628035242,
        "2": 113427455640312821154458202477256070485
         }
 }

./tokentool 2

  {
      "0": {
        "0": 0,
          "1": 85070591730234615865843651857942052864
           }
  }

The following graphic, shows Data Center 1 using the exact values generated for 3 nodes by the Token Generating Tool; Data Center 2 using the values generated for 3 nodes offset by +100; and Data Center 3 using the values generated for 2 nodes and offset by +200.


../../_images/multidc_tokens_offset.png