We Shall Have Order!
One of the more misunderstood aspects of Cassandra data modeling is that of how to properly order your data. In this article we will explore a typical modeling journey from “first attempt” to “end solution.” We will discuss the limitations of the ORDER BY clause, as well as explain how Cassandra clustering order works – and how to take advantage of that order when building a data model.
A common problem that new Cassandra users will attempt to solve is that of modeling time series data. One of the more typical early mistakes made in time series data modeling is designing a table that is dependent on a time as its primary key, like this:
CREATE TABLE posts ( posttime timestamp, postcontent text, postid uuid, userid bigint, PRIMARY KEY (posttime));
This is commonly accompanied by a question such as:
“How can I sort my result set by posttime?”
Of course – for this problem – designating posttime as a single primary key is not going to have the desired effect. Cassandra will place (DataStax 2015a) “data on each node according to the value of the partition key and the range that the node is responsible for.” It is important to understand that the order in which partitioned rows are returned, depends on the order of the hashed token values and not on the key values themselves. So if the goal is to order the queried results by posttime then the primary key definition is going to have to be different.
Some new users make the mistake of using the Byte Ordered Partitioner (BOP) to sort data by partition key. This is discouraged, given its propensity for hot spots and load balancing difficulties. Some have even gone so far as to identify its use as an anti-pattern. It is this author’s opinion that no new clusters should be built with the BOP. Often, a use case which may appear to be a “good fit” for the BOP can be solved with proper data modeling.
The correct way to model this is to first find a different column by which to partition the data. Proper partitioning and clustering keys will allow the data to be sorted and queried in the desired order. The column to designate as the partition key largely depends on the requirements of the application and the particular query you are trying to solve. Be mindful of the cardinality of your potential partition key. If it is too low you will get “hot spots” (poor data distribution), and if it is too high you will negate the benefits of the “wide row” data model (too little data to make ordering worthwhile).
Another incorrect path which new users can find themselves going down is to designating a “dummy” partition key. In this solution, the “dummy” key always has the same value (lowest possible cardinality), and its use can fool the user into thinking that the problem is solved. The two main issues with this are that of hot spots and (Strickland 2014) unbounded row growth. As the (constant) partition key value will be hashed the same on each insert, all of the data will be written to the same partition, thus creating hot spots of data. Additionally, Cassandra can only support (McFadin 2014) two billion columns per partition. With all data being written to that one partition, the chances of reaching that limit will increase over time. As these two points are significant potential issues, I cannot stress enough that using a “dummy” partition key is a terrible idea.
For our example, we will decide to partition our data by userid with a new table specifically designed to serve the query, like this:
CREATE TABLE postsbyuser ( userid bigint, posttime timestamp, postid uuid, postcontent text, PRIMARY KEY ((userid), posttime) ) WITH CLUSTERING ORDER BY (posttime DESC);
This particular table uses a compound primary key. The first column listed in the primary key definition is the partition key (userid) and the second column is a clustering column (posttime). After the partition key(s), data for each partition (DataStax 2015b) is “clustered by the remaining column or columns of the primary key definition.” Essentially, the clustering columns determine the on-disk sort order within each partition. Note that this table definition also uses the WITH clause, which indicates that the data should be clustered in descending order by posttime.
However, this can lead to another common incorrect assumption. Consider the following query and question:
> SELECT * FROM postsbyuser LIMIT 100;
“I am trying to query the 100 most-recent posts by user. Why is my data in random order even though CLUSTERING ORDER BY is specified in postsbyuser?”
In the example table above, queries will return data that is sorted according to the clustering key(s) only when a partition key is also specified. Without a partition key specified in the WHERE clause, the actual order of the result set then becomes dependent on the hashed values of userid. This is apparent whenpostsbyuser is queried without specifying a WHERE clause and using the token() function on userid:
SELECT userid, token(userid), posttime FROM postsbyuser;
userid | token(userid) | posttime --------+----------------------+-------------------------- 1 | -4069959284402364209 | 2015-01-25 13:25:00-0600 1 | -4069959284402364209 | 2015-01-25 13:22:00-0600 0 | -3485513579396041028 | 2015-01-25 13:21:00-0600 2 | -3248873570005575792 | 2015-01-25 13:28:00-0600 2 | -3248873570005575792 | 2015-01-25 13:27:00-0600 2 | -3248873570005575792 | 2015-01-25 13:26:00-0600
In any case, to serve the above query we need qualify it with the partition key (userid). This query will return the 100 most-recent posts where userid is equal to 2:
> SELECT * FROM postsbyuser WHERE userid=2 LIMIT 100;
While we are on the right track, one last issue that may still be encountered with this solution is (again) unbounded row growth. Depending on the frequency of user postings, this solution may be at risk for hitting the limit of two billion columns per partition (probably not, but we will assume so for the purposes of this example). A possible solution for this is to add an additional partition key. Let’s say that we know that the postsbyuser data will not exceed the limit of two billion columns per partition in a single calendar year. Then, we could solve this issue by using year as an additional partition key:
CREATE TABLE postsbyuseryear ( userid bigint, posttime timestamp, postid uuid, postcontent text, year bigint, PRIMARY KEY ((userid, year), posttime) ) WITH CLUSTERING ORDER BY (posttime DESC);
Now we can query the top most-recent posts by userid for a specific year, in descending order byposttime:
> SELECT userid, year, posttime, postcontent FROM postsbyuseryear WHERE userid=1 AND year=2015;
userid | year | posttime | postcontent --------+------+--------------------------+------------------------ 1 | 2015 | 2015-01-25 13:25:00-0600 | Game over, man! 1 | 2015 | 2015-01-25 13:22:00-0600 | B26354 filed & monitored 1 | 2015 | 2015-01-25 13:21:00-0600 | I aim to misbehave.
Note that the ORDER BY clause does not need to be specified in the SELECT, as the data will already be sorted based on the CLUSTERING ORDER specified in the table definition. The ORDER BY clause could be used if we wanted to alter the sort direction (“ASCending” vs. “DESCending”) of the result set.
In summary, when it comes to ordering your result sets in Cassandra, remember these points:
The clustering column(s) determine the data’s on-disk sort order only within a partition key.
Do not model your table with a “dummy” partition key (key that always has the same value).
It is not necessary to specify the ORDER BY clause on a query if your desired sort direction (“ASCending/DESCending”) already matches the CLUSTERING ORDER in the table definition.
On a CREATE TABLE command, Cassandra will specify CLUSTERING ORDER for you in the table definition if you did not specify it yourself. The default CLUSTERING ORDER on a table is comprised of your clustering keys in the “ASCending” sort direction.
Be mindful of the limit of two billion columns per wide row (partition), as well as the problems that accompany unbounded row growth.
For tables supporting queries for recent, time-based data, you may want to specify a “DESCending” sort direction in your table definition.
Do not create a new cluster using the Byte Ordered Partitioner. If still you think that you really need to do this, contact someone at DataStax support first.
- DataStax (2015a). Consistent Hashing. Retrieved from:https://www.datastax.com/documentation/cassandra/2.1/cassandra/architecture/architectureDataDistributeHashing_c.html
- DataStax (2015b). Compound Keys and Clustering. Retrieved from:https://www.datastax.com/documentation/cql/3.1/cql/ddl/ddl_compound_keys_c.html
- Denis M. (2012). Apache Cassandra Anti-Patterns. Slide 13. Strange Loop 2012. St. Louis, MO. Retrieved from: http://www.slideshare.net/mattdennis/strangeloop-2012-apache-cassandra-anti-patterns(slide 13).
- Strickland R. (2014). Cassandra High Availability. Packt Publishing Ltd. Birmingham, UK. (pp. 144-45).
This blog post originally appeared in Planet Cassandra.