DataStax Developer Blog

New CQL features coming in 2.0.6

By Sylvain Lebresne -  February 28, 2014 | 8 Comments

The next release (as of this writing) on the 2.0 branch of Cassandra, Cassandra 2.0.6, adds a few noteworthy features to CQL. This post details those.

Batching of conditional updates

The first addition to CQL in Cassandra 2.0.6 is the ability to batch conditional updates. Cassandra 2.0 introduced lightweight transactions in the form of conditions on updates, but it wasn’t possible to batch such updates so far. Cassandra 2.0.6 changes that, allowing to group multiple conditional updates in a batch as long as all those updates apply to the same partition (the underlying Paxos implementation works at the granularity of the partition, so it is not possible to batch conditions on different partitions).

Note that it is possible to group updates that have conditions with others that don’t, but as soon as one statement in a batch uses a condition, the whole batch will be committed using a single Paxos proposal, and this if all of the conditions contained in the bach apply.

Batching conditional updates offers a few interesting possibilities when combined with another new feature: static columns.

Static columns

A static column is a special column that is shared by all the rows of the same partition. Let’s take an example: suppose that we want to store per-user bills that needs to be paid and keep a per-user balance amount of what remains to be paid. The invariant we want to maintain is that the balance is always the sum of all unpaid bills:

  CREATE TABLE bills (
     user text,
     balance int static,
     expense_id int,
     amount int,
     description text,
     paid boolean,
     PRIMARY KEY (user, expense_id)
  );

Because the balance is static, we will only have one balance for each user in practice, no matter how many bills a user have. We can then insert a few expenses:

  BEGIN BATCH
      INSERT INTO bills (user, balance) VALUES ('user1', -8) IF NOT EXISTS;
      INSERT INTO bills (user, expense_id, amount, description, paid)
                 VALUES ('user1', 1, 8, 'burrito', false);
  APPLY BATCH;
  
  BEGIN BATCH
      UPDATE bills SET balance = -208 WHERE user='user1' IF balance = -8;
      INSERT INTO bills (user, expense_id, amount, description, paid)
                 VALUES ('user1', 2, 200, 'hotel room', false);
  APPLY BATCH;

Note that we are allowed to update a static column in a statement that only provide the partition key, which wouldn’t be correct for a normal column (since the primary key contains a clustering column). Also note that due to the use of (batched) conditional updates, we guarantee that the balance will always correspond to the amount due for the unpaid bills. This wouldn’t be possible if the balance was stored in a separate table since a batch with conditional updates cannot span multiple partitions. At that point we will have:

  SELECT * FROM bills;
  
   user  | expense_id | balance | amount | description | paid
  -------+------------+---------+--------+-------------+-------
   user1 |          1 |    -208 |      8 |     burrito | False
   user1 |          2 |    -208 |    200 |  hotel room | False
  

Note that both rows share the same balance. I would then be possible to pay a bill with:

  BEGIN BATCH
      UPDATE bills SET balance=-200 WHERE user='user1' IF balance=-208;
      UPDATE bills SET paid=true
                 WHERE user='user1' AND expense_id=1
                    IF paid=false;
  APPLY BATCH;
  
  SELECT * FROM bills;
  
   user  | expense_id | balance | amount | description | paid
  -------+------------+---------+--------+-------------+-------
   user1 |          1 |    -200 |      8 |     burrito |  True
   user1 |          2 |    -200 |    200 |  hotel room | False
  

This example is just one use of static columns, but they can be more generally used for optimizations, when you want to store some per-partition “static” information alongside clustered rows and still want to be able to query both of those with a single SELECT.

Please note that:

  • only columns not part of the PRIMARY key can be static.
  • only tables with at least one clustering column can have static columns (a table with no clustering column has only one row per partition and as such all columns are already static by construction).
  • tables with the COMPACT STORAGE option cannot have static columns.

Increased control for "slicing" over multiple clustering columns

The last improvement to CQL in Cassandra 2.0.6 is a new syntax that allows slightly more general "slicing" over the rows of a given partition when there is more than one clustering column. Consider the following (somewhat artificial) example:

  CREATE TABLE timeline (
      day text,
      hour int,
      min int,
      sec int,
      value text,
      PRIMARY KEY (day, hour, min, sec)
  );
  
  INSERT INTO timeline (day, hour, min, sec, value)
                VALUES ('12 Jan 2014', 3, 43, 12, 'event1');
  INSERT INTO timeline (day, hour, min, sec, value)
                VALUES ('12 Jan 2014', 3, 52, 58, 'event2');
  INSERT INTO timeline (day, hour, min, sec, value)
                VALUES ('12 Jan 2014', 4, 37, 01, 'event3');
  INSERT INTO timeline (day, hour, min, sec, value)
                VALUES ('12 Jan 2014', 4, 37, 41, 'event3');
  INSERT INTO timeline (day, hour, min, sec, value)
                VALUES ('12 Jan 2014', 6, 00, 34, 'event4');
  
  SELECT * FROM timeline;
  
   day         | hour | min | sec | value
  -------------+------+-----+-----+--------
   12 Jan 2014 |    3 |  43 |  12 | event1
   12 Jan 2014 |    3 |  52 |  58 | event2
   12 Jan 2014 |    4 |  37 |   1 | event3
   12 Jan 2014 |    4 |  37 |  41 | event3
   12 Jan 2014 |    6 |   0 |  34 | event4
  

and suppose that you want all events for the 12th of January 2014 between 3:50:00 and 4:37:30 (I said it was somewhat artificial). Doing:

  SELECT * FROM timeline WHERE day='12 Jan 2014'
                           AND hour >= 3 AND min >= 50
                           AND hour <= 4 AND min <= 37 AND sec <=30;

doesn't work. No only is that not allowed but no row can satisfy such query since no time can have both min >= 50 AND min <= 37. This is not the query we are looking for. So to allow this type of query, Cassandra 2.0.6 will introduce the following syntax:

  SELECT * FROM timeline WHERE day='12 Jan 2014'
                           AND (hour, min) >= (3, 50)
                           AND (hour, min, sec) <= (4, 37, 30);
  
   day         | hour | min | sec | value
  -------------+------+-----+-----+--------
   12 Jan 2014 |    3 |  52 |  58 | event2
   12 Jan 2014 |    4 |  37 |   1 | event3
  

which pretty directly translates the intent.



Comments

  1. DuyHai DOAN says:

    Great improvements on CQL3 side. That opens possibility for a lot of new useful modeling patterns.

  2. Gary Stewart says:

    2 thumbs up for static columns! :-)
    This makes modeling on partition with 1:1 and 1:* data much easier. So when will the DSE version come :-)

  3. Jānis Kalniņš says:

    Sylvain, please, could you clarify if it will be possible to mix conditional queries on one partition with a non-conditional ones on another partition in a single batch operation?

    For example if I have a user account table and every account must have a unique email address would it be possible to insert a new email address with IF NOT EXISTS to an email index table and then update an email address field in an accounts table all in one batch?

  4. Vidhya says:

    Can a distributed counter serve as a static column?

  5. Sylvain Lebresne says:

    @Jānis It’s not possible currently no.

  6. cowardlydragon says:

    Can we use collections on a static column?

  7. Nikolai Grigoriev says:

    Great thing these static columns. I think it closes one of the gaps between CQL and Cassandra data model.

  8. ganesh says:

    Can static columns be indexed?
    Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>