| The Cassandra 1.2 documentation is transitioning to a new format! Please use the new Cassandra 1.2 documentation instead. | Back to Table of Contents All Documents List |
You can use initialize a Cassandra cluster with one or more data centers. Data replicates across the data centers automatically and transparently; no ETL work is necessary to move data between different systems or servers. You can configure the number of copies of the data in each data center and Cassandra handles the rest, replicating the data for you.
Note
In Cassandra, the term data center is a grouping of nodes. Data center is synonymous with replication group, that is, a grouping of nodes configured together for replication purposes.
Each node must be correctly configured before starting the cluster. You must determine or perform the following before starting the cluster:
The following examples demonstrate initializing Cassandra:
This example describes installing a six node cluster spanning two racks in a single data center. Each node is configured to use the RackInferringSnitch (multiple rack aware) and 256 virtual nodes (recommended).
It is recommended to have more than one seed node per data center.
To initialize the cluster:
Set properties for each node in the cassandra.yaml file. The location of this file depends on the type of installation; see Cassandra Configuration Files Locations or DataStax Enterprise Configuration Files Locations.
Note
After changing properties in the cassandra.yaml file, you must restart the node for the changes to take effect.
Suppose you install Cassandra on these nodes with one node per rack serving as a seed:
It is a best practice to have at more than one seed node per data center.
If you have a firewall running on the nodes in your cluster, you must open certain ports to allow communication between the nodes. See Configuring firewall port access.
If the Cassandra is running, stop the node and clear the data.
For packaged installs, run the following commands:
$ sudo service cassandra stop (stops the service)
$ sudo rm -rf /var/lib/cassandra/* (clears the data from the default directories)
For binary installs, run the following commands from the install directory:
$ ps auwx | grep cassandra (finds the Cassandra Java process ID [PID])
$ sudo kill <pid> (stops the process)
$ sudo rm -rf /var/lib/cassandra/* (clears the data from the default directories)
Modify the following property settings in the cassandra.yaml file for each node:
node0
cluster_name: 'MyDemoCluster'
num_tokens: 256
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "110.82.155.0,110.82.155.3"
listen_address: 110.82.155.0
rpc_address: 0.0.0.0
endpoint_snitch: RackInferringSnitch
node1 to node5
The properties for these nodes are the same as node0 except for the listen_address.
After you have installed and configured Cassandra on all nodes, start the seed nodes one at a time, and then start the rest of the nodes.
Note
If the node has restarted because of automatic restart, you must stop the node and clear the data directories, as described in above.
Packaged installs: sudo service cassandra start
Binary installs, run one of the following commands from the install directory:
bin/cassandra (starts in the background)
bin/cassandra -f (starts in the foreground)
To check that the ring is up and running, run the nodetool status command.
This example describes installing a six node cluster spanning two data centers. Each node is configured to use the PropertyFileSnitch (uses a user-defined description of the network details) and 256 virtual nodes (recommended).
It is recommended to have more than one seed node per data center.
To configure a cluster with multiple data centers:
Set properties for each node in the cassandra.yaml and cassandra-topology.properties files. The location of these files depends on the type of installation; see Cassandra Configuration Files Locations or DataStax Enterprise Configuration Files Locations.
Note
After changing properties in these files, you must restart the node for the changes to take effect.
Suppose you install Cassandra on these nodes:
If you have a firewall running on the nodes in your cluster, you must open certain ports to allow communication between the nodes. See Configuring firewall port access.
If the Cassandra is running, stop the node and clear the data.
For packaged installs, run the following commands:
$ sudo service cassandra stop (stops the service)
$ sudo rm -rf /var/lib/cassandra/* (clears the data from the default directories)
For binary installs, run the following commands from the install directory:
$ ps auwx | grep cassandra (finds the Cassandra Java process ID [PID])
$ sudo kill <pid> (stops the process)
$ sudo rm -rf /var/lib/cassandra/* (clears the data from the default directories)
Modify the following property settings in the cassandra.yaml file for each node:
node0:
cluster_name: 'MyDemoCluster'
num_tokens: 256
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "10.168.66.41,10.176.170.59"
listen_address: 10.168.66.41
endpoint_snitch: PropertyFileSnitch
Note
Include at least one node from each data center.
node1 to node5
The properties for these nodes are the same as node0 except for the listen_address.
In the cassandra-topology.properties file, assign the data center and rack names you determined in the Prerequisites to the IP addresses of each node. For example:
# Cassandra Node IP=Data Center:Rack
10.168.66.41=DC1:RAC1
10.176.43.66=DC2:RAC1
10.168.247.41=DC1:RAC1
10.176.170.59=DC2:RAC1
10.169.61.170=DC1:RAC1
10.169.30.138=DC2:RAC1
Also, in the cassandra-topologies.properties file, assign a default data center name and rack name for unknown nodes.
# default for unknown nodes
default=DC1:RAC1
After you have installed and configured Cassandra on all nodes, start the seed nodes one at a time, and then start the rest of the nodes.
Note
If the node has restarted because of automatic restart, you must stop the node and clear the data directories, as described in above.
Packaged installs: sudo service cassandra start
Binary installs, run one of the following commands from the install directory:
bin/cassandra (starts in the background)
bin/cassandra -f (starts in the foreground)
To check that the ring is up and running, run the nodetool status command.
Links to more information about configuring a data center: