Apache Cassandra 0.8 Documentation

About Replication in Cassandra

Replication is the process of storing copies of data on multiple nodes to ensure reliability and fault tolerance. When you create a keyspace in Cassandra, you must decide the replica placement strategy: the number of replicas and how those replicas are distributed across nodes in the cluster. The replication strategy relies on the cluster-configured snitch to help it determine the physical location of nodes and their proximity to each other.

The total number of replicas across the cluster is often referred to as the replication factor. A replication factor of 1 means that there is only one copy of each row. A replication factor of 2 means two copies of each row. All replicas are equally important; there is no primary or master replica in terms of how read and write requests are handled.

As a general rule, the replication factor should not exceed the number of nodes in the cluster. However, it is possible to increase replication factor, and then add the desired number of nodes afterwards. When replication factor exceeds the number of nodes, writes will be rejected, but reads will be served as long as the desired consistency level can be met.

About Replica Placement Strategy

The replica placement strategy determines how replicas for a keyspace will be distributed across the cluster. The replica placement strategy is set when you create a keyspace.

There are a number of strategies to choose from based on your goals and the information you have about where nodes are located.

SimpleStrategy

SimpleStrategy is the default replica placement strategy when creating a keyspace using Cassandra CLI. Other clients, such as CQL, require you to explicitly specify a strategy.

SimpleStrategy places the first replica on a node determined by the partitioner. Additional replicas are placed on the next nodes clockwise in the ring without considering rack or data center location.

../../_images/simple_strategy_replication.png

NetworkTopologyStrategy

NetworkTopologyStrategy is the preferred replication placement strategy when you have information about how nodes are grouped in your data center, or you have (or plan to have) your cluster deployed across multiple data centers. This strategy allows you to specify how many replicas you want in each data center.

When deciding how many replicas to configure in each data center, the primary considerations are (1) being able to satisfy reads locally, without incurring cross-datacenter latency, and (2) failure scenarios.

The two most common ways to configure multiple data center clusters are:

  • Two replicas in each data center. This configuration tolerates the failure of a single node per replication group, and still allows local reads at a consistency level of ONE.
  • Three replicas in each data center. This configuration tolerates the failure of a one node per replication group at a strong consistency level of LOCAL_QUORUM, or tolerates multiple node failures per data center using consistency level ONE.

Asymmetrical replication groupings are also possible depending on your use case. For example, you may want to have three replicas per data center to serve real-time application requests, and then have a single replica in a separate data center designated to running analytics. In Cassandra, the term data center is synonymous with replication group - it is a grouping of nodes configured together for replication purposes. It does not have to be a physical data center.

With NetworkTopologyStrategy, replica placement is determined independently within each data center (or replication group). The first replica per data center is placed according to the partitioner (same as with SimpleStrategy). Additional replicas in the same data center are then determined by walking the ring clockwise until a node in a different rack from the previous replica is found. If there is no such node, additional replicas will be placed in the same rack. NetworkTopologyStrategy prefers to place replicas on distinct racks if possible. Nodes in the same rack (or similar physical grouping) can easily fail at the same time due to power, cooling, or network issues.

Here is an example of how NetworkTopologyStrategy would place replicas spanning two data centers with a total replication factor of 4 (two replicas in Data Center 1 and two replicas in Data Center 2):

../../_images/network_strategy_replication_ring.png ../../_images/network_strategy_replication_rack.png

Notice how tokens are assigned to alternating racks.

NetworkTopologyStrategy relies on a properly configured snitch to place replicas correctly across data centers and racks, so it is important to configure your cluster to use a snitch that can correctly determine the locations of nodes in your network.

Note

NetworkTopologyStrategy should be used in place of the OldNetworkTopologyStrategy, which only supports a limited configuration of exactly 3 replicas across 2 data centers, with no control over which data center gets two replicas for any given row key. Some rows will have two replicas in the first and one in the second, while others will have two in the second and one in the first.

About Snitches

The snitch is a configurable component of a Cassandra cluster used to define how the nodes are grouped together within the overall network topology (such as rack and data center groupings). Cassandra uses this information to route inter-node requests as efficiently as possible within the confines of the replica placement strategy. The snitch does not affect requests between the client application and Cassandra (it does not control which node a client connects to).

Snitches are configured for a Cassandra cluster in the cassandra.yaml configuration file. All nodes in a cluster should use the same snitch configuration. When assigning tokens, assign them to alternating racks. For example: rack1, rack2, rack3, rack1, rack2, rack3, and so on.

../../_images/multi_tokens_rack.png

The following snitches are available:

SimpleSnitch

The SimpleSnitch (the default) is appropriate if you have no rack or data center information available. Single-data center deployments (or single-zone in public clouds) usually fall into this category.

If using this snitch, use replication_factor=<#> when defining your keyspace strategy_options. This snitch does not recognize data center or rack information.

RackInferringSnitch

RackInferringSnitch infers the topology of the network by analyzing the node IP addresses. This snitch assumes that the second octet identifies the data center where a node is located, and the third octet identifies the rack.

../../_images/rack_inferring_snitch_ips.png

If using this snitch, use the second octet number of your node IPs as your data center names when defining your keyspace strategy_options. For example, 100 would be the data center name.

PropertyFileSnitch

PropertyFileSnitch determines the location of nodes by referring to a user-defined description of the network details located in the property file cassandra-topology.properties. This snitch is best when your node IPs are not uniform or you have complex replication grouping requirements. See Configuring the PropertyFileSnitch for more information.

If using this snitch, you can define your data center names to be whatever you want. Just make sure the data center names you define in the cassandra-topology.properties file correlates to what you name your data centers in your keyspace strategy_options.

EC2Snitch

EC2Snitch is for deployments on Amazon EC2 only. Instead of using the node’s IP address to infer node location, this snitch uses the AWS API to request the region and availability zone of a node. Regions are treated as data centers and availability zones are treated as racks within a data center. For example, if a node is in us-east-1a, us-east is the data center name and 1a is the rack location.

If using this snitch, use the EC2 region name (for example,``us-east``) as your data center names when defining your keyspace strategy_options.

About Dynamic Snitching

By default, all snitches also use a dynamic snitch layer that monitors read latency and, when possible, routes requests away from poorly-performing nodes. The dynamic snitch is enabled by default, and is recommended for use in most deployments.

Dynamic snitch thresholds can be configured in the cassandra.yaml configuration file for a node.