In this scenario, a mixed workload cluster has more than one data center for each type of node. For example, if the cluster has 4 Hadoop nodes, 4 Cassandra nodes, and 2 Solr nodes, the cluster could have 5 data centers: 2 data centers for Hadoop nodes, 2 data centers for Cassandra nodes, and 1 data center for Solr nodes. A single data center cluster has only 1 data center for each type of node.
Data replication can be distributed across multiple, geographically dispersed data centers, between different physical racks in a data center, or between public cloud providers and on-premise managed data centers. Data replicates across the data centers automatically and transparently - no ETL work is necessary to move data between different systems or servers. You can configure the number of copies of the data in each data center and Cassandra handles the rest, replicating the data for you. To configure a single data center cluster, see Single Data Center Deployment.
To correctly configure a multi-node cluster with multiple data centers, requires:
This information is used to configure the following properties on each node in the cluster:
This example describes installing a six node cluster spanning two data centers. The steps for configuring multiple data centers on binary and packaged installations are the same except the configuration files are located in different directories.
Location of the property files in packaged installations:
Location of the property files in binary installations:
Note
After changing properties in these files, you must restart the node for the changes to take effect.
To configure a cluster with multiple data centers:
Suppose you install DataStax Enterprise on these nodes:
10.168.66.41
10.176.43.66
10.168.247.41
10.176.170.59
10.169.61.170
10.169.30.138
Assign tokens so that data is evenly distributed within each data center by calculating the token assignments with the Token Generating Tool and offset the token for the second data center:
Node |
IP Address |
Token |
Offset |
Data Center |
|---|---|---|---|---|
node0 |
10.168.66.41 |
0 |
NA |
DC1 |
node1 |
10.176.43.66 |
56713727820156410577229101238628035242 |
NA |
DC1 |
node2 |
10.168.247.41 |
113427455640312821154458202477256070485 |
NA |
DC1 |
node3 |
10.176.170.59 |
10 |
10 |
DC2 |
node4 |
10.169.61.170 |
56713727820156410577229101238628035252 |
10 |
DC2 |
node5 |
10.169.30.138 |
113427455640312821154458202477256070495 |
10 |
DC2 |
For more information, see Calculating Tokens for a Multiple Data Center Cluster.
Stop the nodes and clear the data.
For packaged installs, run the following commands:
$ sudo service dse stop (stops the service)
$ sudo rm -rf /var/lib/cassandra/* (clears the data from the default directories)
For binary installs, run the following commands from the install directory:
$ ps auwx | grep dse (finds the Cassandra and DataStax Enterprise Java process ID [PID])
$ sudo kill <pid> (stops the process)
$ sudo rm -rf /var/lib/cassandra/* (clears the data from the default directories)
Modify the following property settings in the cassandra.yaml file for each node:
node0:
initial_token: 56713727820156410577229101238628035242
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "10.168.66.41,10.176.170.59"
listen_address: 10.176.43.66
Note
You must include at least one node from each data center. It is a best practice to have at more than one seed node per data center.
node1 to node5
The properties for the rest of the nodes are the same as Node0 except for the initial_token and listen_address:
Node initial_token listen address node1 56713727820156410577229101238628035242 10.176.43.66 node2 113427455640312821154458202477256070485 10.168.247.41 node3 10 10.176.170.59 node4 56713727820156410577229101238628035252 10.169.61.170 node5 113427455640312821154458202477256070495 10.169.30.138
For each node, change the dse.yaml file to specify the snitch to be delegated by the DseDelegateSnitch. For more information about snitches, see the About Snitches. For example, to specify the PropertyFileSnitch, enter:
delegated_snitch: org.apache.cassandra.locator.PropertyFileSnitch
Determine a naming convention for each data center and rack, for example: DC1, DC2 or 100, 200 and RAC1, RAC2 or R101, R102.
In the cassandra-topology.properties file, assign data center and rack names to the IP addresses of each node, and assign a default data center name and rack name for unknown nodes. For example:
# Cassandra Node IP=Data Center:Rack
10.168.66.41=DC1:RAC1
10.176.43.66=DC2:RAC1
10.168.247.41=DC1:RAC1
10.176.170.59=DC2:RAC1
10.169.61.170=DC1:RAC1
10.169.30.138=DC2:RAC1
# default for unknown nodes
default=DC1:RAC1
After you have installed and configured DataStax Enterprise on all nodes, start the seed nodes one at a time, and then start the rest of the nodes.
Note
If the node has restarted because of automatic restart, you must stop the node and clear the data directories, as described above.
Check that your ring is up and running:
$ cd /<install_directory> $ bin/nodetool ring -h localhost
Links to more information about configuring a data center: