email iconemail phone iconcall

Multi-datacenter Replication in Cassandra

By Joaquin Casares -  November 5, 2012 | 6 Comments

Learn more about Apache Cassandra

DataStax Enterprise's heavy usage of Cassandra's innate datacenter concepts are important as they allow multiple workloads to be run across multiple datacenters. This occurs on near real-time data without ETL processes or any other manual operations. In most setups, this is handled via "virtual" datacenters that follow Cassandra's internals for datacenters, while the actual hardware exists in the same physical datacenter. Over the course of this blog post, we will cover this and a couple of other use cases for multiple datacenters.

Workload Separation

By implementing datacenters as the divisions between varying workloads, DataStax Enterprise allows a natural distribution of data from real-time datacenters to near real-time analytics and search datacenters.

Whenever a write comes in via a client application, it hits the main Cassandra datacenter and returns the acknowledgment at the current consistency level (typically less than LOCAL_QUORUM, to allow for a high throughput and low latency). In parallel and asynchronously, these writes are sent off to the Analytics and Solr datacenters based on the replication strategy for active keyspaces. A typical replication strategy would look similar to {Cassandra: 3, Analytics: 2, Solr: 1}, depending on use cases and throughput requirements.

Once these asynchronous hints are received on the additional clusters, they undergo the normal write procedures and are assimilated into that datacenter. This way, any analytics jobs that are running can easily and simply access this new data without an ETL process. For DSE's Solr nodes, these writes are introduced into the memtables and additional Solr processes are triggered to incorporate this data.

Live Backup Scenario

Some Cassandra use cases instead use different datacenters as a live backup that can quickly be used as a fallback cluster. We will cover the most common use case using Amazon's Web Services Elastic Cloud Computing (EC2) in the following example.

Specific Use Case

Allow your application to have multiple fallback patterns across multiple consistencies and datacenters.


A client application was created and currently sends requests to EC2's US-East-1 region at a consistency level (CL) of LOCAL_QUORUM. Later, another datacenter is added to EC2's US-West-1 region to serve as a live backup. The replication strategy can be a full live backup ({US-East-1: 3, US-West-1: 3}) or a smaller live backup ({US-East-1: 3, US-West-1: 2}) to save costs and disk usage for this regional outage scenario. All clients continue to write to the US-East-1 nodes by ensuring that the client's pools are restricted to just those nodes, to minimize cross datacenter latency.

To implement a better cascading fallback, initially the client's connection pool will only be aware of all nodes in the US-East-1 region. In the event of client errors, all requests will retry at a CL of LOCAL_QUORUM, for X times, then decrease to a CL of ONE while escalating the appropriate notifications. If the requests are still unsuccessful, using a new connection pool consisting of nodes from the US-West-1 datacenter, requests should begin contacting US-West-1 at a higher CL, before ultimately dropping down to a CL of ONE. Meanwhile, any writes to US-West-1 should be asynchronously tried on US-East-1 via the client, without waiting for confirmation and instead logging any errors separately.

For cases like this, natural events and other failures can be prevented from affecting your live applications.


As long as the original datacenter is restored within gc_grace_seconds (10 days by default), perform a rolling repair (without the -pr option) on all of its nodes once they come back online.

If, however, the nodes will be set to come up and complete the repair commands after gc_grace_seconds, you will need to take the following steps in order to ensure that deleted records are not reinstated:

  • remove all the offending nodes from the ring using `nodetool removetoken`,
  • clear all their data (data directories, commit logs, snapshots, and system logs),
  • decrement all tokens in this region,
  • disable auto_bootstrap,
  • start up the entire region,
  • and, finally, run a rolling repair (without the -pr option) on all nodes in the other region.

After these nodes are up to date, you can restart your applications and continue using your primary datacenter.

Geographical Location Scenario

There are certain use cases where data should be housed in different datacenters depending on the user's location in order to provide more responsive exchange. The use case we will be covering refers to datacenters in different countries, but the same logic and procedures apply for datacenters in different regions. The logic that defines which datacenter a user will be connected to resides in the application code.

Specific Use Case

Have users connect to datacenters based on geographic location, but ensure this data is available cluster-wide for backup, analytics, and to account for user travel across regions.


The required end result is for users in the US to contact one datacenter while UK users contact another to lower end-user latency. An additional requirement is for both of these datacenters to be a part of the same cluster to lower operational costs. This can be handled using the following rules:

  • When reading and writing from each datacenter, ensure that the clients the users connect to can only see one datacenter, based on the list of IPs provided to the client.
  • If doing reads at QUORUM, ensure that LOCAL_QUORUM is being used and not EACH_QUORUM since this latency will affect the end user's performance experience.
  • Depending on how consistent you want your datacenters to be, you may choose to run repair operations (without the -pr option) more frequently than the required once per gc_grace_seconds.


When setting up clusters in this manner:

  • You ensure faster performance for each end user.
  • You gain multi-region live backups for free, just as mentioned in the section above. Granted, the performance for requests across the US and UK will not be as fast, but your application does not have to hit a complete standstill in the event of catastrophic losses.
  • Users can travel across regions and in the time taken to travel, the user's information should have finished replicating asynchronously across regions.


  • Always use the NetworkTopologyStrategy when using multiple datacenters. SimpleStrategy will work across datacenters, but has the following disadvantages:
    • It is harder to maintain.
    • Does not provide any of the features mentioned above.
    • Introduces latency on each write (depending on datacenter distance and latency between datacenters).
  • Defining one rack for the entire cluster is the simplest and most common implementation. Multiple racks should be avoided for the following reasons:
    • Most users tend to ignore or forget rack requirements that state racks should be in an alternating order to allow the data to get distributed safely and appropriately.
    • Many users are not using the rack information effectively by using a setup with as many racks as they have nodes, or similar non-beneficial scenarios.
    • When using racks correctly, each rack should typically have the same number of nodes.
    • In a scenario that requires a cluster expansion while using racks, the expansion procedure can be tedious since it typically involves several node moves and has has to ensure to ensure that racks will be distributing data correctly and evenly. At times when clusters need immediate expansion, racks should be the last things to worry about.

DataStax has many ways for you to advance in your career and knowledge.

You can take free classes, get certified, or read one of our many white papers.

register for classes

get certified

DBA's Guide to NoSQL


  1. Sankalp says:

    One of the problems in your “Geographical Location Scenario” is that you cannot define replication factor by region user is coming in but by keyspace.
    This causes the number of replicas to be the same in each DC considering your scenario.
    What ideally is required is that you store more copies in the DC which is closer to the user. For example:
    You store 3 copies in US for US user and 1 copy in Europe.
    For Europe user, 3 copies in Europe and 1 copy in US.

    The way to do that right now is to assign these set of users in different keyspaces with different replication factors. But then you need to have a mapping of which user belongs to which keyspace.

  2. Joaquin Casares says:

    You’re correct. The purpose for this blog post was to highlight the basic options to allow most users to start their planning before moving into use case specific issues.

    Yes, that is a good advanced used case that sounds ideal. Thanks for sharing!

  3. Aniket B. Dumbare says:

    This article emphasizes to not to use multiple racks however as far as I understand from the other Datastax documentation links, this is applicable only for single-token architecture and not applicable for virtual nodes.

  4. Sivaji says:

    Consider 2 data-centers in different geographical locations having 2 nodes each DC, correct me if any statement I have made is incorrect.

    US-DC1’s coordinator receives a write request from a client which it forwards to its local nodes containing the replicas. For a successful write to happen (Local_Quorum consistency level), the local nodes should have acknowledged the write to the coordinator within the Write_request_timeout_in_ms = 60000 ms period.

    when US-DC1’s coordinator receives the write request, it also sends the write request to a coordinator in EU-DC1, which passes on the request to the local nodes containing the replica data. These EU-DC1 nodes are not required to acknowledge back within the 60000 ms for the write mutation to be a success, only US-DC1 nodes need to respond within that 60000 ms for a successful write.

    So my question is what timers govern the write requests happening at EU-DC1? The EU-DC1 coordinator has to respond back to the US-DC1 coordinator within a specific timeout period, otherwise US-DC1 coordinator stores a hinted handoff for EU-DC1, so what is that specific timeout period called in the cassandra.yaml file?

    We have a situation where tcpdumps at unix level are clean.. which means no packet drops at switch.. But hints size is growing on US-DC1 with “Timed out” message in system.log.. Our investigation shows EU-DC1 nodes were healthy, under utilized.. having no network issue between US – EU DCs !

  5. rico says:

    Is cassandra support offline synchronization? just curious. :)

  6. Carlos Gomes says:

    I’m working with NoSQL replication modeling. How can I get Cassandra replication statistics?

    Thank you!


Your email address will not be published. Required fields are marked *

Subscribe for newsletter: