In this scenario, data replication is distributed across a single data center.
Data replicates across the data centers automatically and transparently; no ETL work is necessary to move data between different systems or servers. You can configure the number of copies of the data in each data center and Cassandra handles the rest, replicating the data for you. To configure a multiple data center cluster, see Initializing Multiple Data Center Clusters on Cassandra.
Note
In Cassandra, the term data center is a grouping of nodes. Data center is synonymous with replication group, that is, a grouping of nodes configured together for replication purposes. The data replication protects against hardware failure and other problems that cause data loss in a single cluster.
To correctly configure a multi-node cluster, requires the following:
This information is used to configure the Node and Cluster Initialization Properties in the cassandra.yaml configuration file on each node in the cluster. Each node should be correctly configured before starting up the cluster.
This example describes installing a six node cluster spanning two racks in a single data center.
You set properties for each node in the cassandra.yaml file. The location of this file depends on the type of installation; see Cassandra Configuration Files Locations or DataStax Enterprise Configuration Files Locations.
Note
After changing properties in the cassandra.yaml file, you must restart the node for the changes to take effect.
To configure a mixed-workload cluster:
The nodes have the following IPs, and one node per rack will serve as a seed:
Calculate the token assignments using the Token Generating Tool.
Node |
Token |
|---|---|
node0 |
0 |
node1 |
28356863910078205288614550619314017621 |
node2 |
56713727820156410577229101238628035242 |
node3 |
85070591730234615865843651857942052864 |
node4 |
113427455640312821154458202477256070485 |
node5 |
141784319550391026443072753096570088106 |
If you have a firewall running on the nodes in your cluster, you must open certain ports to allow communication between the nodes. See Configuring Firewall Port Access.
Stop the nodes and clear the data.
For packaged installs, run the following commands:
$ sudo service cassandra stop (stops the service)
$ sudo rm -rf /var/lib/cassandra/* (clears the data from the default directories)
For binary installs, run the following commands from the install directory:
$ ps auwx | grep cassandra (finds the Cassandra Java process ID [PID])
$ sudo kill <pid> (stops the process)
$ sudo rm -rf /var/lib/cassandra/* (clears the data from the default directories)
Modify the following property settings in the cassandra.yaml file for each node:
Note
In the - seeds list property, include the internal IP addresses of each seed node.
node0
cluster_name: 'MyDemoCluster'
initial_token: 0
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "110.82.155.0,110.82.155.3"
listen_address: 110.82.155.0
rpc_address: 0.0.0.0
endpoint_snitch: RackInferringSnitch
node1 to node5
The properties for the rest of the nodes are the same as Node0 except for the initial_token and listen_address:
node1
initial_token: 28356863910078205288614550619314017621
listen_address: 110.82.155.1
node2
initial_token: 56713727820156410577229101238628035242
listen_address: 110.82.155.2
node3
initial_token: 85070591730234615865843651857942052864
listen_address: 110.82.155.3
node4
initial_token: 113427455640312821154458202477256070485
listen_address: 110.82.155.4
node5
initial_token: 141784319550391026443072753096570088106
listen_address: 110.82.155.5
After you have installed and configured Cassandra on all nodes, start the seed nodes one at a time, and then start the rest of the nodes.
Note
If the node has restarted because of automatic restart, you must stop the node and clear the data directories, as described in above.
Packaged installs: sudo service cassandra start
Binary installs, run one of the following commands from the install directory:
bin/cassandra (starts in the background)
bin/cassandra -f (starts in the foreground)
Check that your ring is up and running:
Packaged installs: nodetool ring -h localhost
Binary installs:
cd /<install_directory>
$ bin/nodetool ring -h localhost
The ring status is displayed. This can give you an idea of the load balanced within the ring and if any nodes are down. If your cluster is not properly configured, different nodes may show a different ring; this is a good way to check that every node views the ring the same way.