Apache Cassandra 1.0 Documentation

Initializing a Multiple Node Cluster in a Single Data Center

This document corresponds to an earlier product version. Make sure you are using the version that corresponds to your version.

Latest Cassandra documentation | Earlier Cassandra documentation

In this scenario, data replication is distributed across a single data center.

Data replicates across the data centers automatically and transparently; no ETL work is necessary to move data between different systems or servers. You can configure the number of copies of the data in each data center and Cassandra handles the rest, replicating the data for you. To configure a multiple data center cluster, see Initializing Multiple Data Center Clusters on Cassandra.

Note

In Cassandra, the term data center is a grouping of nodes. Data center is synonymous with replication group, that is, a grouping of nodes configured together for replication purposes. The data replication protects against hardware failure and other problems that cause data loss in a single cluster.

Prerequisites

To correctly configure a multi-node cluster, requires the following:

This information is used to configure the Node and Cluster Initialization Properties in the cassandra.yaml configuration file on each node in the cluster. Each node should be correctly configured before starting up the cluster.

Configuration Example

This example describes installing a six node cluster spanning two racks in a single data center.

You set properties for each node in the cassandra.yaml file. The location of this file depends on the type of installation; see Cassandra Configuration Files Locations or DataStax Enterprise Configuration Files Locations.

Note

After changing properties in the cassandra.yaml file, you must restart the node for the changes to take effect.

To configure a mixed-workload cluster:

  1. The nodes have the following IPs, and one node per rack will serve as a seed:

    • node0 110.82.155.0 (seed1)
    • node1 110.82.155.1
    • node2 110.82.155.2
    • node3 110.82.156.3 (seed2)
    • node4 110.82.156.4
    • node5 110.82.156.5
  2. Calculate the token assignments using the Token Generating Tool.

    Node Token
    node0 0
    node1 28356863910078205288614550619314017621
    node2 56713727820156410577229101238628035242
    node3 85070591730234615865843651857942052864
    node4 113427455640312821154458202477256070485
    node5 141784319550391026443072753096570088106
  3. If you have a firewall running on the nodes in your cluster, you must open certain ports to allow communication between the nodes. See Configuring Firewall Port Access.

  4. Stop the nodes and clear the data.

    • For packaged installs, run the following commands:

      $ sudo service cassandra stop (stops the service)

      $ sudo rm -rf /var/lib/cassandra/* (clears the data from the default directories)

    • For binary installs, run the following commands from the install directory:

      $ ps auwx | grep cassandra (finds the Cassandra Java process ID [PID])

      $ sudo kill <pid> (stops the process)

      $ sudo rm -rf /var/lib/cassandra/* (clears the data from the default directories)

  5. Modify the following property settings in the cassandra.yaml file for each node:

    Note

    In the - seeds list property, include the internal IP addresses of each seed node.

    node0

    cluster_name: 'MyDemoCluster'
    initial_token: 0
    seed_provider:
      - class_name: org.apache.cassandra.locator.SimpleSeedProvider
        parameters:
             - seeds: "110.82.155.0,110.82.155.3"
    listen_address: 110.82.155.0
    rpc_address: 0.0.0.0
    endpoint_snitch: RackInferringSnitch
    

    node1 to node5

    The properties for the rest of the nodes are the same as Node0 except for the initial_token and listen_address:

    node1

    initial_token: 28356863910078205288614550619314017621
    listen_address: 110.82.155.1
    

    node2

    initial_token: 56713727820156410577229101238628035242
    listen_address: 110.82.155.2
    

    node3

    initial_token: 85070591730234615865843651857942052864
    listen_address: 110.82.155.3
    

    node4

    initial_token: 113427455640312821154458202477256070485
    listen_address: 110.82.155.4
    

    node5

    initial_token: 141784319550391026443072753096570088106
    listen_address: 110.82.155.5
    
  6. After you have installed and configured Cassandra on all nodes, start the seed nodes one at a time, and then start the rest of the nodes.

    Note

    If the node has restarted because of automatic restart, you must stop the node and clear the data directories, as described in above.

    • Packaged installs: sudo service cassandra start

    • Binary installs, run one of the following commands from the install directory:

      bin/cassandra (starts in the background)

      bin/cassandra -f (starts in the foreground)

  7. Check that your ring is up and running:

    • Packaged installs: nodetool ring -h localhost

    • Binary installs:

      cd /<install_directory>

      $ bin/nodetool ring -h localhost

    The ring status is displayed. This can give you an idea of the load balanced within the ring and if any nodes are down. If your cluster is not properly configured, different nodes may show a different ring; this is a good way to check that every node views the ring the same way.