In Cassandra, data distribution and replication go together. This is because Cassandra is designed as a peer-to-peer system that makes copies of the data and distributes the copies among a group of nodes. Data is organized by table and identified by a row key called a primary key. The primary key determines which node the data is stored on. Copies of rows are called replicas. When data is first written, it is also referred to as a replica.
When your create a cluster, you must specify the following:
This section provides more detail about how the consistent hashing mechanism distributes data across a cluster in Cassandra. Consistent hashing partitions data based on the primary key. For example, if you have the following data:
| jim | age: 36 | car: camaro | gender: M |
| carol | age: 37 | car: bmw | gender: F |
| johnny | age: 12 | gender: M | |
| suzy | age: 10 | gender: F |
Cassandra assigns a hash value to each primary key:
| Primary key | Murmur3 hash value |
|---|---|
| jim | -2245462676723223822 |
| carol | 7723358927203680754 |
| johnny | -6723372854036780875 |
| suzy | 1168604627387940318 |
Each node in the cluster is responsible for a range of data based on the hash value:
| Node | Murmur3 start range | Murmur3 end range |
|---|---|---|
| A | -9223372036854775808 | -4611686018427387903 |
| B | -4611686018427387904 | -1 |
| C | 0 | 4611686018427387903 |
| D | 4611686018427387904 | 9223372036854775807 |
Cassandra places the data on each node according to the value of the primary key and the range that the node is responsible for. For example, in a four node cluster, the data in this example is distributed as follows:
| Node | Start range | End range | Primary key | Hash value |
|---|---|---|---|---|
| A | -9223372036854775808 | -4611686018427387903 | johnny | -6723372854036780875 |
| B | -4611686018427387904 | -1 | jim | -2245462676723223822 |
| C | 0 | 4611686018427387903 | suzy | 1168604627387940318 |
| D | 4611686018427387904 | 9223372036854775807 | carol | 7723358927203680754 |
Prior to version 1.2, you had to calculate and assign a single token to each node in a cluster. Each token determined the node's position in the ring and its portion of data according to its hash value. Although the design of consistent hashing used prior to version 1.2 (compared to other distribution designs), allowed moving a single node's worth of data when adding or removing nodes from the cluster, it still required substantial effort to do so.
Starting in version 1.2, Cassandra changes this paradigm from one token and range per node to many tokens per node. The new paradigm is called virtual nodes. Virtual nodes allow each node to own a large number of small ranges distributed throughout the cluster. Virtual nodes also use consistent hashing to distribute data but using them doesn't require token generation and assignment.
The top portion of the graphic shows a cluster without virtual nodes. In this paradigm, each node is assigned a single token that represents a location in the ring. Each node stores data determined by mapping the row key to a token value within a range from the previous node to its assigned value. Each node also contains copies of each row from other nodes in the cluster. For example, range E replicates to nodes 5, 6, and 1. Notice that a node owns exactly one contiguous range in the ring space.
The bottom portion of the graphic shows a ring with virtual nodes. Within a cluster, virtual nodes are randomly selected and non-contiguous. The placement of a row is determined by the hash of the row key within many smaller ranges belonging to each node.
Virtual nodes simplify many tasks in Cassandra:
For more information, see the article Virtual nodes in Cassandra 1.2.
To set up virtual nodes:
Set the number of tokens on each node in your cluster with the num_tokens parameter in the cassandra.yaml file. The recommended value is 256. Do not set the initial_token parameter.
Generally when all nodes have equal hardware capability, they should have the same number of virtual nodes. If the hardware capabilities vary among the nodes in your cluster, assign a proportional number of virtual nodes to the larger machines. For example, you could designate your older machines to use 128 virtual nodes and your new machines (that are twice as powerful) with 256 virtual nodes.
Cassandra stores replicas on multiple nodes to ensure reliability and fault tolerance. A replication strategy determines the nodes where replicas are placed.
The total number of replicas across the cluster is referred to as the replication factor. A replication factor of 1 means that there is only one copy of each row on one node. A replication factor of 2 means two copies of each row, where each copy is on a different node. All replicas are equally important; there is no primary or master replica. As a general rule, the replication factor should not exceed the number of nodes in the cluster. However, you can increase the replication factor and then add the desired number of nodes later. When replication factor exceeds the number of nodes, writes are rejected, but reads are served as long as the desired consistency level can be met.
Two replication strategies are available:
Use only for a single data center. SimpleStrategy places the first replica on a node determined by the partitioner. Additional replicas are placed on the next nodes clockwise in the ring without considering topology (rack or data center location).
Use NetworkTopologyStrategy when you have (or plan to have) your cluster deployed across multiple data centers. This strategy specify how many replicas you want in each data center.
NetworkTopologyStrategy places replicas in the same data center by walking the ring clockwise until reaching the first node in another rack. NetworkTopologyStrategy attempts to place replicas on distinct racks because nodes in the same rack (or similar physical grouping) often fail at the same time due to power, cooling, or network issues.
When deciding how many replicas to configure in each data center, the two primary considerations are (1) being able to satisfy reads locally, without incurring cross data-center latency, and (2) failure scenarios. The two most common ways to configure multiple data center clusters are:
Asymmetrical replication groupings are also possible. For example, you can have three replicas per data center to serve real-time application requests and use a single replica for running analytics.
To set the replication strategy for a keyspace, see CREATE KEYSPACE.
When you use NetworkToplogyStrategy, during creation of the keyspace strategy_options, you use the data center names defined for the snitch used by the cluster. To place replicas in the correct location, Cassandra requires a keyspace definition that uses the snitch-aware data center names. For example, if the cluster uses the PropertyFileSnitch, create the keyspace using the user-defined data center and rack names in the cassandra-topologies.properties file. If the cluster uses the EC2Snitch, create the keyspace using EC2 data center and rack names.