Replication is the process of storing copies of data on multiple nodes to ensure reliability and fault tolerance.
Cassandra stores copies, called replicas, of each row based on the row key. You set the number of replicas when you create a keyspace using the replica placement strategy. In addition to setting the number of replicas, this strategy sets the distribution of the replicas across the nodes in the cluster depending on the cluster's topology.
The total number of replicas across the cluster is referred to as the replication factor. A replication factor of 1 means that there is only one copy of each row on one node. A replication factor of 2 means two copies of each row, where each copy is on a different node. All replicas are equally important; there is no primary or master replica. As a general rule, the replication factor should not exceed the number of nodes in the cluster. However, you can increase the replication factor and then add the desired number of nodes afterwards. When replication factor exceeds the number of nodes, writes are rejected, but reads are served as long as the desired consistency level can be met.
To determine the physical location of nodes and their proximity to each other, you need to configure a snitch for your cluster in addition to the replication strategy.
The available strategies are:
Use SimpleStrategy for simple single data center clusters. This strategy is the default replica placement strategy when creating a keyspace using the Cassandra CLI. See Creating a Keyspace. When using the Cassandra Query Language interface, you must explicitly specify a strategy. See CREATE KEYSPACE.
SimpleStrategy places the first replica on a node determined by the partitioner. Additional replicas are placed on the next nodes clockwise in the ring without considering rack or data center location. The following graphic shows three replicas of three rows placed across four nodes:
Use NetworkTopologyStrategy when you have (or plan to have) your cluster deployed across multiple data centers. This strategy specifies how many replicas you want in each data center.
When deciding how many replicas to configure in each data center, the two primary considerations are (1) being able to satisfy reads locally, without incurring cross-datacenter latency, and (2) failure scenarios. The two most common ways to configure multiple data center clusters are:
Asymmetrical replication groupings are also possible. For example, you can have three replicas per data center to serve real-time application requests and use a single replica for running analytics.
The NetworkTopologyStrategy determines replica placement independently within each data center as follows:
NetworkTopologyStrategy attempts to place replicas on distinct racks because nodes in the same rack (or similar physical grouping) can fail at the same time due to power, cooling, or network issues.
The following example shows how NetworkTopologyStrategy places replicas spanning two data centers with a total replication factor of 4. When using NetworkToplogyStrategy, you set the number of replicas per data center.
In the following graphic, notice the tokens are assigned to alternating racks. For more information, see Generating Tokens.
NetworkTopologyStrategy relies on a properly configured snitch to place replicas correctly across data centers and racks. It is important to configure your cluster to use the type of snitch that correctly determines the locations of nodes in your network.
Be sure to use NetworkTopologyStrategy instead of the OldNetworkTopologyStrategy, which supported only a limited configuration of 3 replicas across 2 data centers, without control over which data center got the two replicas for any given row key. This strategy meant that some rows had two replicas in the first and one replica in the second, while others had two in the second and one in the first.
A snitch maps IPs to racks and data centers. It defines how the nodes are grouped together within the overall network topology. Cassandra uses this information to route inter-node requests as efficiently as possible. The snitch does not affect requests between the client application and Cassandra and it does not control which node a client connects to.
You configure snitches in the cassandra.yaml configuration file. All nodes in a cluster must use the same snitch configuration.
The following snitches are available:
The SimpleSnitch (the default) does not recognize data center or rack information. Use it for single-data center deployments (or single-zone in public clouds).
When defining your keyspace strategy_options, use replication_factor=<#>.
The RackInferringSnitch infers (assumes) the topology of the network by the octet of the node's IP address. Use this snitch as an example of writing a custom Snitch class.
When defining your keyspace strategy_options, use the second octet number of your node IPs for your data center name. In the above graphic, you would use 100 for the data center name.
The PropertyFileSnitch determines the location of nodes by rack and data center. This snitch uses a user-defined description of the network details located in the property file cassandra-topology.properties. Use this snitch when your node IPs are not uniform or if you have complex replication grouping requirements.
When using this snitch, define your data center names as desired and make sure that the data center names defined in the cassandra-topology.properties file correlates to the name of your data centers in your keyspace strategy_options. Every node in the cluster should be described in this file, and the file should be exactly the same on every node in the cluster.
For example, if your cluster had non-uniform IPs and two physical data centers with two racks in each, and a third logical data center for replicating analytics data, the cassandra-topology.properties file might look like this:
# Data Center One 220.127.116.11=DC1:RAC1 18.104.22.168=DC1:RAC1 22.214.171.124=DC1:RAC1 126.96.36.199=DC1:RAC2 188.8.131.52=DC1:RAC2 184.108.40.206=DC1:RAC2 # Data Center Two 220.127.116.11=DC2:RAC1 18.104.22.168=DC2:RAC1 22.214.171.124=DC2:RAC1 126.96.36.199=DC2:RAC2 188.8.131.52=DC2:RAC2 184.108.40.206=DC2:RAC2 # Analytics Replication Group 220.127.116.11=DC3:RAC1 18.104.22.168=DC3:RAC1 22.214.171.124=DC3:RAC1 # default for unknown nodes default=DC3:RAC1
The GossipingPropertyFileSnitch allows you to define a local node's data center and rack and use gossip for propagating the information to other nodes. To define the data center and rack, create a cassandra-rackdc.properties file in the node's conf directory. For example:
To migrate from the PropertyFileSnitch to the GossipingPropertyFileSnitch, update one node at a time to allow gossip time to propagate. The PropertyFileSnitch is used as a fallback when cassandra-topologies.properties is present.
Use the EC2Snitch for simple cluster deployments on Amazon EC2 where all nodes in the cluster are within a single region. The region is treated as the data center and the availability zones are treated as racks within the data center. For example, if a node is in us-east-1a, us-east is the data center name and 1a is the rack location. Because private IPs are used, this snitch does not work across multiple regions.
When defining your keyspace strategy_options, use the EC2 region name (for example,``us-east``) as your data center name.
Use the EC2MultiRegionSnitch for deployments on Amazon EC2 where the cluster spans multiple regions. As with the EC2Snitch, regions are treated as data centers and availability zones are treated as racks within a data center. For example, if a node is in us-east-1a, us-east is the data center name and 1a is the rack location.
This snitch uses public IPs as broadcast_address to allow cross-region connectivity. This means that you must configure each Cassandra node so that the listen_address is set to the private IP address of the node, and the broadcast_address is set to the public IP address of the node. This allows Cassandra nodes in one EC2 region to bind to nodes in another region, thus enabling multiple data center support. (For intra-region traffic, Cassandra switches to the private IP after establishing a connection.)
Additionally, you must set the addresses of the seed nodes in the cassandra.yaml file to that of the public IPs because private IPs are not routable between networks. For example:
seeds: 126.96.36.199, 188.8.131.52
To find the public IP address, run this command from each of the seed nodes in EC2:
When defining your keyspace strategy_options, use the EC2 region name, such as us-east, as your data center names.
By default, all snitches also use a dynamic snitch layer that monitors read latency and, when possible, routes requests away from poorly-performing nodes. The dynamic snitch is enabled by default; it is recommended for use in most deployments.
Configure dynamic snitch thresholds for each node in the cassandra.yaml configuration file. For more information, see the properties listed under Inter-node Communication and Fault Detection Properties.