Apache Cassandra 1.1 Documentation

Generating Tokens

This document corresponds to an earlier product version. Make sure you are using the version that corresponds to your version.

Latest Cassandra documentation | Earlier Cassandra documentation

Tokens assign a range of data to a particular node within a data center.

When you start a Cassandra cluster, data is distributed across the nodes in the cluster based on the row key using a partitioner. You must assign each node in a cluster a token and that token determines the node's position in the ring and its range of data. The tokens assigned to your nodes need to be distributed throughout the entire possible range of tokens (0 to 2127 -1). Each node is responsible for the region of the ring between itself (inclusive) and its predecessor (exclusive). To illustrate using a simple example, if the range of possible tokens was 0 to 100 and you had four nodes, the tokens for your nodes should be 0, 25, 50, and 75. This approach ensures that each node is responsible for an equal range of data. When using more than one data center, each data center should be partitioned as if it were its own distinct ring.

Note

Each node in the cluster must be assigned a token before it is started for the first time. The token is set with the initial_token property in the cassandra.yaml configuration file.

Token Generating Tool

Cassandra includes a tool for generating tokens using the maximum possible range (0 to 2127 -1) for use with the RandomPartitioner.

Usage

  • Packaged installs: token-generator <nodes_in_DC1> <nodes_in_DC2> ...
  • Binary installs: <install_location>/tools/bin/token-generator <nodes_in_DC1> <nodes_in_DC2> ...
  • Interactive Mode: Use token-generator without options and messages will guide you through the process.

The available options are:

Long Option

Short Option

Description

--help

-h

Show help.
--ringrange <RINGRANGE> Specify a numeric maximum token value for your ring, if different from the default value of 2127 -1.
--graph Displays a rendering of the generated tokens as line segments in a circle, colored according to data center.

--nts

-n

Optimizes multiple cluster distribution for NetworkTopologyStrategy (default).

--onts

-o

Optimizes multiple cluster distribution for the OldNetworkTopologyStrategy.

--test

-o

Run in test mode. Opens Firefox and displays an HTML file that shows various ring arrangements.

Examples

  • Generate tokens for nodes in a single data center:

    ./tools/bin/token-generator 4
    
    Node #1:                                        0
    Node #2:   42535295865117307932921825928971026432
    Node #3:   85070591730234615865843651857942052864
    Node #4:  127605887595351923798765477786913079296
    
  • Generate tokens for multiple data centers using NetworkTopologyStrategy (default):

    ./tools/bin/token-generator 4 4
    
    DC #1:
      Node #1:                                        0
      Node #2:   42535295865117307932921825928971026432
      Node #3:   85070591730234615865843651857942052864
      Node #4:  127605887595351923798765477786913079296
    DC #2:
      Node #1:  169417178424467235000914166253263322299
      Node #2:   41811290829115311202148688466350243003
      Node #3:   84346586694232619135070514395321269435
      Node #4:  126881882559349927067992340324292295867
    

    Replica placement is independent within each data center.

  • Generate tokens for multiple racks in a single data center:

    ./tools/bin/token-generator 8
    
    DC #1:
      Node #1:                                        0
      Node #2:   21267647932558653966460912964485513216
      Node #3:   42535295865117307932921825928971026432
      Node #4:   63802943797675961899382738893456539648
      Node #5:   85070591730234615865843651857942052864
      Node #6:  106338239662793269832304564822427566080
      Node #7:  127605887595351923798765477786913079296
      Node #8:  148873535527910577765226390751398592512
    

    As a best practice, each rack should have the same number of nodes. This allows you to alternate the rack assignments: rack1, rack2, rack3, rack1, rack2, rack3, and so on:


    ../../_images/multirack_tokens.png

Token Assignments when Adding Nodes

When adding nodes to a cluster, you must avoid token collisions. You can do this by offsetting the token values, which allows room for the new nodes.

The following graphic shows an example using an offset of +100:


../../_images/multidc_tokens_offset.png

Note

It is more important that the nodes within each data center manage an equal amount of data than the distribution of the nodes within the cluster. See balancing the load.