Apache Cassandra 0.7 Documentation

Adding Nodes to a Cluster

To add a node to a Cassandra cluster, first make sure that Cassandra is installed on the new node. Perform all the steps described in the Installing Cassandra section of this document except starting the node – you must perform the configuration steps described below before starting the cluster.

To expand a single node to a two-node cluster as we will do in our examples in this page, you must edit the configuration file cassandra.yaml. The following values must be specified on both the existing and new nodes:

  • seeds – the list of seeds for the cluster
  • rpc_address and listen_address – network addresses

The following values must be specified on the new node (they should already be set properly on the existing node):

  • initial_token – defining the node’s token range
  • auto_bootstrap – enables auto-migration of data to the new node

More information on setting up Cassandra clusters is available in the Clustering section in the DataStax reference documentation.

Seed List

You must specify at least one node to act as the seeds for other hosts joining the ring. When additional hosts are added, the seed nodes provide information required to join the ring.

Do not think of a seed node as a “master,” or “central” node. Seeds are contacted by new nodes joining the ring using the address you provide in the seeds list. At that time, seeds provide the new nodes information about the ring – what other nodes are included in it, what are their locations, and so on. After a node joins the ring, it shares ring information through the gossip protocol, and does not make any further special contact with the seed node.

There is no strict rule to determine which hosts need to be listed as seeds, but all nodes in a cluster should have the same seed list. For a production deployment, DataStax recommends two seeds per data center.

To configure the seed list:

Edit cassandra.yaml for each node and add the first node (10.203.55.185 in this example) as the seed in each.

seeds:
    - 10.203.55.186

In a small production cluster with the minimum recommended two seeds, the list would contain two IP addresses, each on a separate line (not a comma-separated list).

seeds:
    - 10.203.55.186
    - 10.203.71.154

Note

When you edit the seed list of nodes that are already up and running, you do not strictly need to restart those nodes to make the changes effective immediately. Running nodes will pick up the new seed entries the next time they are restarted.

Listen Address and RPC Address

In order for nodes to communicate via the Gossip protocol, you must specify the interfaces on which your nodes will listen for client traffic via Thrift and inter-cluster traffic. Set the rpc_address value to an interface accessible by clients, and the listen_address value to interfaces routable from other servers in the cluster.

To configure listen_adress and rpc_address settings:

Edit cassandra.yaml on all nodes in the cluster and replace the default localhost entries to specify the interfaces which will listen for traffic. For the first node in this example:

listen_address: 10.203.55.186
...
rpc_address: 10.203.55.186

and for the second node (10.203.71.154 for this example):

listen_address: 10.203.71.154
...
rpc_address: 10.203.71.154

Initial Token Values

Whenever you expand the capacity of a Cassandra cluster, Riptano recommends explicitly setting each node’s initial token (initial_token in cassandra.yaml). This is required for all nodes in order to balance the load evenly. The very first node in cluster, when set properly to zero, will never need its initial_token value edited, but all other tokens must be recalculated every time you expand the cluster.

To determine the correct initial token values for the cluster, divide 2 to the 127th power by the total number of nodes, enumerate the nodes starting with zero, then multiply the node’s number by the quotient. The Cassandra Wiki provides a python program to calculate new tokens for the nodes. A Python script like the following, when run from a command line, will prompt you for a number of tokens and will print the initial token values:

#! /usr/bin/python
import sys
if (len(sys.argv) > 1):
        num=int(sys.argv[1])
else:
        num=int(raw_input("How many nodes are in your cluster? "))
for i in range(0, num):
        print 'node %d: %d' % (i, (i*(2**127)/num))

Once you have calculated token values, provide the appropriate value for each new node in cassandra.yaml. For a two-node cluster as in our example, the correct intial token for the second node is 85070591730234615865843651857942052863. This assumes that the first node’s value is already correctly set to zero.

Autobootstrapping

Autobootstrapping causes a new node in the cluster to automatically migrate the correct range of data from existing nodes, assuming that all initial token values are properly set. In cassandra.yaml for a new node, enable autobootstrapping by setting auto_bootstrap: true (default is false).

Note

An autobootstrapping node cannot have itself in the list of seeds nor can it contain an initial_token already claimed by another node. To add new seeds, autobootstrap the nodes first, and then configure them as seeds.

Starting a Cassandra Cluster

Start the seed node, and verify connectivity with nodetool ring as in the single node example above. Then start the remaining node. After a few minutes of pauses to exchange data (you can follow the progress on the second node via the system log located by default in /var/log/cassandra/system.log), running nodetool ring again should give you something like the following:

~$ nodetool -h localhost -p 8080 ring
   Address         Status  State   Load        Owns     Range                                      Ring
                                                        85070591730234615865843651857942052863
   10.203.71.154   Up      Normal  2.53 KB     50.00    0                                          |<--|
   10.203.55.186   Up      Normal  1.33 KB     50.00    85070591730234615865843651857942052863     |-->|

Starting a Node Outside the Ring

If you want to start an instance of Cassandra without having it join the ring, add an entry like the following in $CASSANDRA_HOME/conf/cassandra-env.sh:

JVM_OPTS="$JVM_OPTS -Dcassandra.join_ring=false"

This can be useful for maintaining “warm spare” nodes that can be added to the ring as needed, or for performing JMX maintenance before joining the ring. When you want to start a warm spare, use the nodetool join command.

Next Steps

After starting up the nodes in your Cassandra cluster, you are ready to take the next steps toward a full Cassandra deployment.