DataStax Enterprise 2.1 Documentation

Generating Tokens

This documentation corresponds to an earlier product version. Make sure this document corresponds to your version.

Latest DSE documentation | Earlier DSE documentation

Tokens assign a range of data to a particular node within a data center.

When you start a DataStax Enterprise cluster, you must choose how the data (column family rows) is divided across the nodes in the cluster. A partitioner determines what each node stores by row (key). A token is a partitioner-dependent element of the cluster. Each node in a cluster is assigned a token and that token determines the node's position in the ring and what data the node is responsible for in the cluster. The tokens assigned to your nodes need to be distributed throughout the entire possible range of tokens. Each node is responsible for the region of the ring between itself (inclusive) and its predecessor (exclusive). As a simple example, if the range of possible tokens was 0 to 100 and you had 4 nodes, you would want the tokens for your nodes to be: 0, 25, 50, 75. This approach ensures that each node is responsible for an equal range of data. Each data center should be partitioned as if it were its own distinct ring.

For more detailed information, see About Data Partitioning in Cassandra.

Note

Each node in the cluster must be assigned a token before it is started for the first time. The token is set with the initial_token property in the cassandra.yaml configuration file.

Token Generating Tool

DataStax provides a Python program for generating tokens. Tokens are integers ranging from 0 to 2 127 -1.

To set up the Token Generating Tool:

  1. Using a text editor, create a new file named tokengentool for your token generator program.

  2. Go to https://raw.github.com/riptano/ComboAMI/2.2/tokentoolv2.py.

  3. Copy and paste the program into the tokengentool file.

  4. Save and close the file.

  5. Make it executable:

    chmod +x tokengentool
    
  6. Run the program:

    ./tokengentool <nodes_in_dc1> <nodes_in_dc2> ...
    

    The Token Generating Tool calculates the token values.

  7. Enter the corresponding value for each node in the initial_token property of the node's cassandra.yaml file.

Calculating Tokens for a Single Data Center

For a single data center, DataStax recommends always using the NetworkTopologyStrategy and the RandomPartitioner. The the NetworkTopologyStrategy is as easy to use as SimpleStrategy and allows for expansion to multiple data centers in the future. It is much easier to configure the most flexible replication strategy initially, than to reconfigure replication after you have already loaded data into your cluster. Be sure to configure the strategy_options for your replication strategy.

For a single data center, enter the number of nodes in Token Generating Tool. For example, for 6 nodes in a single data center, you enter:

./tokengentool 6

The tool displays the token for each node:

{
  "0": {
        "0": 0,
        "1": 28356863910078205288614550619314017621,
        "2": 56713727820156410577229101238628035242,
        "3": 85070591730234615865843651857942052864,
        "4": 113427455640312821154458202477256070485,
        "5": 141784319550391026443072753096570088106
        }
}

Calculating Tokens for Multiple Racks in a Single Data Center

If you have multiple racks in single data center, enter the number of nodes in the Token Generating Tool and then assign the tokens to nodes to alternating racks. For example: rack1, rack2, rack3, rack1, rack2, rack3, and so on. Replica placement and partitioner is the same as with Calculating Tokens for a Single Data Center.

As a best practice, each rack should have the same number of nodes so you can alternate the rack assignments. For example:

./tokengentool 8

The tool displays the token for each node. The image shows the rack assignments:


../../_images/multirack_tokens.png

Calculating Tokens for a Multiple Data Center Cluster

In multiple data center deployments, replica placement must be calculated per data center using the NetworkTopologyStrategy for your custom keyspaces (DataStax Enterprise system keyspaces excluded). This strategy determines replica placement independently within each data center. The first replica is placed according to the partitioner. Additional replicas in the same data center are determined by walking the ring clockwise until a node in a different rack from the previous replica is found. If no such node exists, additional replicas are placed in the same rack. Do not use SimpleStrategy for this type of cluster and be sure to configure the strategy_options for your replication strategy.

There are different methods you can use when calculating multiple data center clusters. The important point is that the nodes within each data center manage an equal amount of data. The distribution of the nodes within the cluster is not as important. DataStax recommends using DataStax Enterprise OpsCenter to rebalance a cluster.

Alternating Token Assignments

Calculate tokens for each data center using the Token Generating Tool and then alternate the token assignments so that the nodes for each data center are evenly dispersed around the ring.

./tokengentool 3 3

The tool displays the token for each node in each data center:

{
  "0": {
      "0": 0,
      "1": 56713727820156410577229101238628035242,
      "2": 113427455640312821154458202477256070485
       },
  "1": {
      "0": 28356863910078205288614550619314017621,
      "1": 85070591730234615865843651857942052863,
      "2": 141784319550391026443072753096570088106
     }
}

The following image shows the token position and data center assignments:


../../_images/multidc_alternate_tokens.png

Avoiding Token Collisions

To avoid token collisions, offset the values for each token. Although you can increment in values of 1, it is better to use a larger offset value, such as 100, to allow room to replace a dead node.

The following shows an example of a cluster with two 3 node data centers and one 2 node data center.

./tokengentool 3

   {
    "0": {
        "0": 0,
        "1": 56713727820156410577229101238628035242,
        "2": 113427455640312821154458202477256070485
         }
 }

./tokentool 2

  {
      "0": {
        "0": 0,
          "1": 85070591730234615865843651857942052864
           }
  }

The graphic shows the distribution of the nodes with the associated offsets.


../../_images/multidc_tokens_offset.png