DataStax Developer Blog

Deploying Cassandra across Multiple Data Centers

By Eric Gilmore -  March 23, 2011 | 7 Comments

Cassandra is designed as a distributed system, for deployment of large numbers of nodes across multiple data centers. Key features of Cassandra’s distributed architecture are specifically tailored for multiple-data center deployment. These features are robust and flexible enough that you can configure the cluster for optimal geographical distribution, for redundancy for failover and disaster recovery, or even for creating a dedicated analytics center replicated from your main data storage centers.

Settings central to multi-data center deployment include:

Replication Factor and Replica Placement Strategy – NetworkTopologyStrategy (the default placement strategy) has capabilities for fine-grained adjustment of the number and location of replicas at the data center and rack level.

Snitch – For multi-data center deployments, it is important to make sure the snitch has complete and accurate information about the network, either by automatic detection (RackInferringSnitch) or details specified in a properties file (PropertyFileSnitch).

Consistency Level – Cassandra provides consistency levels that are specifically designed for scenarios with multiple data centers: LOCAL_QUORUM and EACH_QUORUM. Here, “local” means local to a single data center, while “each” means consistency is strictly maintained at the same level in each data center.

Putting it all Together

Your specific needs will determine how you combine these ingredients in a “recipe” for multi-data center operations. For instance, an organization whose chief aim is to minimize network latency across two large service regions might end up with a relatively simple recipe for two data centers like the following:

Replica Placement Strategy: NetworkTopologyStrategy (NTS)

Replication Factor: 3 for each data center, as determined by the following strategy_options settings in cassandra.yaml:

strategy_options:
DC1 : 3
DC2 : 3

Snitch: RackInferringSnitch. Administrators configure the network topology of the two data centers in such a way that Cassandra can accurately extrapolate the details automatically with RackInferringSnitch.

Write Consistency Level: LOCAL_QUORUM

Read Consistency Level: LOCAL_QUORUM

For all applications that write and read to Cassandra, the default consistency level for both reads and writes is LOCAL_QUORUM. This provides a reasonable level of data consistency while avoiding inter-data center latency.

Visualizing It

In the following depiction of a write operation across our two hypothetical data centers, the darker grey nodes are the nodes that contain the token range for the data being written.

Apache Cassandra Multi datacenter replication

Note that LOCAL_QUORUM consistency allows the write operation to the second data center to be anynchronous. This way, the operation can be marked successful in the first data center – the data center local to the origin of the write – and Cassandra can serve read operations on that data without any delay from inter-data center latency.

Learning More about It

For more detail and more descriptions of multiple-data center deployments, see Multiple Data Centers in the DataStax reference documentation. And make sure to check this blog regularly for news related to the latest progress in multi-DC features, analytics, and other exciting areas of Cassandra development.

 



Comments

  1. Sudesh says:

    I have 4 node cluster setup across 2 DC. DC1 contains 10.28.7.209 and 10.28.7.218. DC2 contains 10.40.7.206 and 10.40.7.191.

    I have keyspace with NTS and strategy_options= DC1: 2 , DC2:2.

    I have snitch as PropertyFileSnitch.

    Topology.properties has definitions :
    10.28.7.209=DC1:RAC1
    10.28.7.218=DC1:RAC2
    10.40.7.206=DC2:RAC1
    10.40.7.191=DC2:RAC2
    default=DC1:r1

    Tokens are balanced as per:
    bin/nodetool -h 10.40.7.206 ring
    Address Status State Load Owns Token
    131291297286969809762394636651102920798
    10.40.7.191 Up Normal 11.01 GB 25.02% 3713143261524536428796724100157456993
    10.28.7.218 Up Normal 21.79 GB 25.00% 46247024543280544328625128736684811735
    10.40.7.206 Up Normal 11.01 GB 24.97% 88731726328828585514925390034962410388
    10.28.7.209 Up Normal 22.19 GB 25.01% 131291297286969809762394636651102920798

    WIth randompartitioner I inserted 1M records, and was expecting 1M rows on each node. I ended up with 1M each on DC1:RAC1 and DC2:RAC2, while ended up with just 75K each on DC2:RAC1 and DC2:RAC2.

    I copied same topology.properties files on all 4 nodes.

    Any clues??

  2. Sudesh says:

    Sorry typo …
    With randompartitioner I inserted 1M records, and was expecting 1M rows on each node. I ended up with 1M each on DC1:RAC1 and DC1:RAC2, while ended up with just 75K each on DC2:RAC1 and DC2:RAC2.

  3. Eric says:

    Sudesh, from my own testing and other sources I depend on as a technical writer, I have not been able to get a definite answer to your question. The initial tokens and other configuration details look fine, and there is no clear reason for your load to fall out of balance.

    I strongly suggest posting the same question on the Cassandra user list and letting the collective mind of the community explore the issue.

  4. Eric Tamme says:

    Sudesh the problem lies with your initial token selection. During a write request, NetworkTopologyStrategy uses PropertyFileSnitch to select the nodes available from a given data center then it attempts to place the token on the node that is responsible for the token range from those in the DC – it DOES NOT have the entire set of nodes to place a write on. It works on each DC separately. Because of this your nodes will end up with 1 getting 75% of the writes and the other getting 25% for each DC.

    Select your tokens for each data center individually, then offset any collisions by 1. Sorry I cant be more verbose in this reply. This question has been answered on the cassandra users mailing list. see the following thread

    http://www.mail-archive.com/user@cassandra.apache.org/msg14981.html

  5. Cassa says:

    I have 4 node cluster setup across 2 DC. DC1 contains host a and b. DC2 contains host c and d.

    I have keyspace with NTS and strategy_options= DC1: 2 , DC2:2.

    I have snitch as PropertyFileSnitch.

    Topology.properties has definitions :
    a=DC1:RAC1
    b=DC1:RAC2
    c=DC2:RAC1
    d=DC2:RAC2
    default=DC1:r1

    Tokens :
    a 0
    b 42535295865117307932921825928971026432
    c 1
    d 42535295865117307932921825928971026433

    I have partitioner randompartitioner.

    I am trying to get/put data into cache for every 5sec. I see the that there are many misses while reading. I did not configure consistency level, so i hope it should take QUORUM. i am not sure if i am missing something else.

  6. dosuser says:

    in version1.2.0 PropertyFileSnitch, Uppercase “DC1″ doesn’t work.
    it should be lowercase.

  7. Yashika says:

    Hi,

    I want to restore Cassandra data from a “SimpleStrategy” Placement Strategy cluster to a “NetworkTopology” Strategy Cluster.

    Are there any changes in the Backup-Restore Process.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>