|Understanding the architecture|
A partitioner determines how data is distributed across the nodes in the cluster (including replicas). Basically, a partitioner is a hash function for computing the token (it's hash) of a partition key. Each row of data is uniquely identified by a partition key and distributed across the cluster by the value of the token.
Both the Murmur3Partitioner and RandomPartitioner use tokens to help assign equal portions of data to each node and evenly distribute data from all the tables throughout the ring or other grouping, such as a keyspace. This is true even if the tables use different partition keys, such as usernames or timestamps. Moreover, the read and write requests to the cluster are also evenly distributed and load balancing is simplified because each part of the hash range receives an equal number of rows on average. For more detailed information, see Consistent hashing.
Cassandra offers the following partitioners:
The Murmur3Partitioner is the default partitioning strategy for new Cassandra clusters and the right choice for new clusters in almost all cases.
Set the partitioner in the cassandra.yaml file:
The Murmur3Partitioner provides faster hashing and improved performance than the previous default partitioner (RandomPartitioner).
You can only use Murmur3Partitioner for new clusters; you cannot change the partitioner in existing clusters. If you are switching to the 1.2 cassandra.yaml be sure to change the partitioner setting to match the previous partitioner.
The Murmur3Partitioner uses the MurmurHash function. This hashing function creates a 64-bit hash value of the partition key. The possible range of hash values is from -263 to +263.
When using the Murmur3Partitioner, you can page through all rows using the token function in a CQL query.
The default partitioner prior to Cassandra 1.2.
Although no longer the default partitioner, you can use the RandomPartitioner in version 1.2, even when using virtual nodes (vnodes). However, if you don't use vnodes, you must calculate the tokens, as described in Generating tokens.
The RandomPartition distributes data evenly across the nodes using an MD5 hash value of the row key. The possible range of hash values is from 0 to 2127 -1.
When using the RandomPartitioner, you can page through all rows using the token function in a CQL query.
Use for ordered partitioning
Cassandra provides the ByteOrderedPartitioner for ordered partitioning. This partitioner orders rows lexically by key bytes. You calculate tokens by looking at the actual values of your partition key data and using a hexadecimal representation of the leading character(s) in a key. For example, if you wanted to partition rows alphabetically, you could assign an A token using its hexadecimal representation of 41.
Using the ordered partitioner allows ordered scans by primary key. This means you can scan rows as though you were moving a cursor through a traditional index. For example, if your application has user names as the partition key, you can scan rows for users whose names fall between Jake and Joe. This type of query is not possible using randomly partitioned partition keys because the keys are stored in the order of their MD5 hash (not sequentially).
Although having the capability to do range scans on rows sounds like a desirable feature of ordered partitioners, there are ways to achieve the same functionality using table indexes.
Using an ordered partitioner is not recommended for the following reasons: