Apache Cassandra 0.7 Documentation

Multiple Data Centers

Cassandra is designed as a distributed system, for deployment of large numbers of nodes across multiple data centers. Among the settings that control operations across multiple data centers, the following areas are especially important:

Replication Factor and Replica Placement Strategy – determine the number and location of replicas in the cluster. NetworkTopologyStrategy (the default placement strategy) has capabilities for fine-grained adjustment of the number and location of replicas at the data center and rack level.

Snitch – the snitch gives Cassandra information about the network topology of the cluster. For multi-data center deployments, it is important to make sure the snitch has complete and accurate information about the network, either by automatic detection (RackInferringSnitch) or details specified in a properties file (PropertyFileSnitch).

Consistency Level – Cassandra provides consistency levels that are specifically designed for scenarios with multiple data centers: LOCAL_QUORUM and EACH_QUORUM. Your priorities regarding availability and network latency will help determine which consistency levels you use for distributing write and read operations among nodes in multiple data centers.

The reference documentation for Clustering provides more detailed information on the specifics of these settings. The rest of this document describes how to optimize Cassandra across data centers for two types of scenarios:

  • Geographical Distribution – Two data centers supporting two separate regions of a social networking site. Emphasis is on cutting down network latency, with low consistency requirements.
  • Disaster Recovery – Two data centers, where the second is maintained chiefly for disaster recovery, and only the primary data center serves read requests during normal operations.

Geographical Distribution

In this scenario, a growing social networking site has opened a second data center in a region into which its user base is rapidly expanding. The primary goal is to minimize network latency and provide a high level of service. Because the nature of the data – text posts, comments, discussions – is not critically time-sensitive, consistency levels can be lowered as part of the drive to increase the efficiency of network operations.

To achieve these aims, administrators configure replication across the two data centers with these settings:

Replica Placement Strategy: NetworkTopologyStrategy (NTS)

Replication Factor: 3 for each data center, as determined by the following strategy_options settings:

strategy_options:
DC1 : 3
DC2 : 3

The Cassandra CLI command to set those options for the keyspace would look like the following:

[default@demo] update keyspace demo with strategy_options=[{DC1:3,  DC2:3}];

Snitch: RackInferringSnitch. Administrators have configured the network topology of the two data centers in such a way that Cassandra can accurately extrapolate the details automatically with RackInferringSnitch. This option requires less attention to configuration details than PropertyFileSnitch (which is described in the Redundancy for Failover scenario).

Write Consistency Level: LOCAL_QUORUM

Read Consistency Level: LOCAL_QUORUM

For all applications that write and read to Cassandra, the default CL for both reads and writes is LOCAL_QUORUM. This provides a reasonable level of data consistency while avoiding inter-data center latency.

Replicating a Write Operation

For purposes of our example, we can simplify the the data centers to only two racks, each containing three nodes: DC1 = 173.1.xxx.xxx and DC2 = 173.2.xxx.xxx. RackInferringSnitch detects the IP addresses and interprets the second octect as the data center, and the third octed as the rack.

The full array of IP addresses are as follows:

Data Center IP Rack IP Node IP
173.1.xxx.xxx 173.1.1.xxx 173.1.1.1
173.1.1.2
173.1.1.3
173.1.2.xxx 173.1.2.1
173.1.2.2
173.1.2.3
173.2.1.xxx 173.2.1.xxx 173.2.1.1
173.2.1.2
173.2.1.3
173.2.2.xxx 173.2.2.1
173.2.2.2
173.2.2.3

Since NTS places replicas on different racks within a data center whenever possible, it will place at least one copy of the write on each rack in our example. (darker circles represent the nodes holding the token range to which the data belongs):

../../_images/Multi-DCReplication.png

Disaster Recovery

In this scenario, a governmental department storing critical data has extended Cassandra across a second data center strictly for disaster recovery purposes. The second data center receives copies of all write operations at a high consistency level, to ensure that it always has fresh information. The second data center does not serve read operations unless it is triggered to go “live” by a disaster event that affects the normal operations of the first data center.

To achieve these aims, administrators configure replication across the two data centers with these settings:

Replica Placement Strategy: NetworkTopologyStrategy (NTS)

Replication Factor: 3 for the primary write/read center, and two for the failover data center, as determined by the following strategy_options settings:

strategy_options:
DC1 : 3
DC2 : 2

The Cassandra CLI command to set those options for the keyspace would look like the following:

[default@demo] update keyspace demo with strategy_options=[{DC1:3,  DC2:2}];

Snitch: PropertyFileSnitch. Administrators of this cluster require the fine-grained control of the network topology provided by PropertyFileSnitch. In practice this snitch would be especially useful whenever IP configuration is “mixed,” with physical node locations that do not map so readily to IP addresses; but to keep the illustration simple, our example will use IP addresses very similar to the geographical distribution example. With those IP addresses, this scenario’s copy of $CASSANDRA_HOME/conf/cassandra-topology.properties includes the following entries:

# Cassandra Node IP=Data Center:Rack

175.1.1.1=DC1:RAC1
175.1.1.2=DC1:RAC1
175.1.1.3=DC1:RAC1

175.1.2.1=DC1:RAC2
175.1.2.2=DC1:RAC2
175.1.2.3=DC1:RAC2

175.2.1.1=DC2:RAC1
175.2.1.2=DC2:RAC1

175.2.2.1=DC2:RAC2
175.2.2.2=DC2:RAC2

# default for unknown nodes
default=DC1:r1

Write Consistency Level: EACH_QUORUM In order to make absolutely sure that critical data is written to the failover data center, write operations use EACH_QUORUM.

Read Consistency Level: LOCAL_QUORUM Read operations are sent only to coordinating nodes in the active data center, so LOCAL_QUORUM suffices to provide the required read consistency.

Replication with Read/Write Operations

By design, this example disaster recovery setup is somewhat rigid – if one of the nodes in DC2 stops responding, Cassandra will report failures to the client application. The same is true of failures in the link between the data centers. With EACH_QUORUM for write operations, the write must be successful in DC2 or the entire operation will fail.

../../_images/Multi-DC2.png

Note that the read operation returns results with the most recent timestamp after two replicas reporting. In this case, with a replication factor of three, two replicas are contacted.