Eric Gilmore

Cassandra is designed as a distributed system, for deployment of large numbers of nodes across multiple data centers. Key features of Cassandra’s distributed architecture are specifically tailored for multiple-data center deployment. These features are robust and flexible enough that you can configure the cluster for optimal geographical distribution, for redundancy for failover and disaster recovery, or even for creating a dedicated analytics center replicated from your main data storage centers.

Settings central to multi-data center deployment include:

Replication Factor and Replica Placement Strategy&nbsp;– NetworkTopologyStrategy (the default placement strategy) has capabilities for fine-grained adjustment of the number and location of replicas at the data center and rack level.

Snitch&nbsp;– For multi-data center deployments, it is important to make sure the snitch has complete and accurate information about the network, either by automatic detection (RackInferringSnitch) or details specified in a properties file (PropertyFileSnitch).

Consistency Level&nbsp;– Cassandra provides consistency levels that are specifically designed for scenarios with multiple data centers:&nbsp;<tt>LOCAL_QUORUM</tt>&nbsp;and&nbsp;<tt>EACH_QUORUM</tt>. Here, “local” means local to a single data center, while “each” means consistency is strictly maintained at the same level in each data center.

<h3>Putting it all Together</h3>

Your specific needs will determine how you combine these ingredients in a “recipe” for multi-data center operations. For instance, an organization whose chief aim is to minimize network latency across two large service regions might end up with a relatively simple recipe for two data centers like the following:

Replica Placement Strategy: NetworkTopologyStrategy (NTS)

Replication Factor: 3 for each data center, as determined by the following&nbsp;<tt>strategy_options</tt>&nbsp;settings in&nbsp;<tt>cassandra.yaml</tt>:

<pre>
strategy_options:
DC1 : 3
DC2 : 3</pre>

Snitch: RackInferringSnitch. Administrators configure the network topology of the two data centers in such a way that Cassandra can accurately extrapolate the details automatically with RackInferringSnitch.

Write Consistency Level:&nbsp;<tt>LOCAL_QUORUM</tt>

Read Consistency Level:&nbsp;<tt>LOCAL_QUORUM</tt>

For all applications that write and read to Cassandra, the default consistency level for both reads and writes is&nbsp;<tt>LOCAL_QUORUM</tt>. This provides a reasonable level of data consistency while avoiding inter-data center latency.

<h3>Visualizing It</h3>

In the following depiction of a write operation across our two hypothetical data centers, the darker grey nodes are the nodes that contain the token range for the data being written.

&nbsp;

Note that&nbsp;<tt>LOCAL_QUORUM</tt>&nbsp;consistency allows the write operation to the second data center to be anynchronous. This way, the operation can be marked successful in the first data center – the data center local to the origin of the write – and Cassandra can serve read operations on that data without any delay from inter-data center latency.

<h3>Learning More about It</h3>

For more detail and more descriptions of multiple-data center deployments, see&nbsp;Multiple Data Centers&nbsp;in the DataStax reference documentation. And make sure to check this blog regularly for news related to the latest progress in multi-DC features, analytics, and other exciting areas of Cassandra development.

&nbsp;

Deploying Cassandra across Multiple Data Centers

Eric Gilmore

Share

Share

Putting it all Together

Visualizing It

Learning More about It

More Technology

Knowledge Graphs for RAG without a GraphDB

How Winweb Built its AI Assistant with DataStax Astra DB and LangChain

Vercel + Astra DB: Get Data into Your GenAI Apps Fast

Simplifying Agent Development with Astra DB Connector for Vertex AI Search

One-stop Data API for Production GenAI