DataStax Developer Blog

CQL improvements in Cassandra 2.0

By Sylvain Lebresne -  July 26, 2013 | 6 Comments

At the time of this writing, Cassandra 2.0.0-beta2 has just been released and it shouldn’t be too long before the final lands. In this blog post, we will describe the improvements and new features that C* 2.0 brings on the CQL front. I’ll note that none of the following improvements are backward breaking changes of the language, they are only additions to it.

ALTER DROP

CQL3 in Cassandra 1.2 does not allow to drop a CQL3 column. This is fixed in Cassandra 2.0, where if you have

    CREATE TABLE myTable (
        id text PRIMARY KEY,
        prop1 text,
        prop2 int,
        prop3 float
    )


then you are allowed to do:

    ALTER TABLE myTable DROP prop3;

As is expected of such statement, it will drop prop3 from the table definition but will also remove all data pertaining to that column in the database. This data removal is however performed lazily during compaction (compaction simply looks for dropped columns in the input sstables and doesn’t include them in the output; note that if you want to force the removal of dropped columns without waiting for compactions to automatically kick in, you can simply call nodetool upgradesstables) and the ALTER TABLE statement (that simply updates the table metadata to register the drop) will return quickly.

Conditional updates

Cassandra 2.0 introduces some support for lightweight transactions (using Paxos underneath). On the CQL front, this is exposed through the support of the IF keyword in INSERT, UPDATE and DELETE statements (the previous blog post on these lightweight transactions provides examples of this support in CQL so we don’t repeat them here).

Conditional schema modifications

Schema modification statements (CREATE/DROP KEYSPACE/TABLE/INDEX) also support a form of conditionals in Cassandra 2.0. This is particularly convenient if at the start of some insertion code you don’t know if a keyspace (or table, or index) exists or needs to be created. For that, you can now do

    CREATE KEYSPACE IF NOT EXISTS ks
               WITH replication = { 'class': 'SimpleStrategy',
                                    'replication_factor' : 3 };
    CREATE TABLE IF NOT EXISTS test (k int PRIMARY KEY);


Similarly, you can issue conditional drops:

    DROP KEYSPACE IF EXISTS ks;

This syntax is merely a convenience: for instance, a (non-conditional) CREATE KEYSPACE statement will throw a specific exception if the keyspace already exists. So you can also issue such non-conditional creation and ignore the exception if it is thrown, but this new syntax offers a more concise way to achieve the same effect.

Triggers

Cassandra 2.0 also introduces experimental support for triggers and CQL offers a new syntax to register a trigger on a table, namely:

    CREATE TRIGGER myTrigger
                ON myTable
             USING 'org.apache.cassandra.triggers.InvertedIndex'


where 'org.apache.cassandra.triggers.InvertedIndex' is the Java class implementing the trigger in that example (this class is available at here if you are interested). It is of course also possible to drop such a trigger:

    DROP TRIGGER myTrigger ON myTable

Please note however that trigger support is currently experimental. In particular, while the CQL syntax above is unlikely to change, the Java interface that trigger need to currently implement will probably change.

Secondary indexes on PRIMARY KEY columns

In Cassandra 1.2, you can only create secondary indexes on CQL3 columns that are not part of the PRIMARY KEY definition. In other words, the following does not work in C* 1.2 but will in C* 2.0:

    CREATE TABLE timeline (
        event_id uuid,
        week_in_year int,
        created_at timeuuid,
        content blob,
        PRIMARY KEY ((event_id, week_in_year), created_at)
    );
    
    -- Invalid in Cassandra 1.2 but not in 2.0
    CREATE INDEX ON timeline (week_in_year);

Let me remark however that secondary indexing of collection columns is still not yet supported in C* 2.0. This will come later.

Aliases in SELECT

CQL supports a number of function calls on the column names selected by a SELECT. For instance, in cqlsh you can do:

cqlsh:ks> SELECT event_id, dateOf(created_at), blobAsText(content) 
            FROM timeline;

 event_id                | dateOf(created_at)       | blobAsText(content)
-------------------------+--------------------------+----------------------
 550e8400-e29b-41d4-a716 | 2013-07-26 10:44:33+0200 | Something happened!?


While this is fine, it might not be convenient in practice to refer to a column named dateOf(created_at) in the result set, so a new alias feature as been added and you can now do:

cqlsh:ks> SELECT event_id, 
                 dateOf(created_at) AS creation_date,
                 blobAsText(content) AS content 
            FROM timeline;

 event_id                | creation_date            | content
-------------------------+--------------------------+----------------------
 550e8400-e29b-41d4-a716 | 2013-07-26 10:44:33+0200 | Something happened!?

Preparing timestamp, ttl and limit

Cassandra 1.2 doesn’t allow you to use a bind marker for the TIMESTAMP and TTL properties of update statements, nor for the LIMIT property of SELECT statements. This is now fixed and you can for instance prepare statements like:

    SELECT * FROM myTable LIMIT ?;
    UPDATE myTable USING TTL ? SET v = 2 WHERE k = 'foo';

Native protocol improvements

On top of the improvements to the CQL language described above, Cassandra 2.0 introduces the 2nd version of the native protocol for CQL, which brings a number of improvements over the first version (Let’s note that client drivers will need to be modified to support that new protocol version (and its improvements) but that Cassandra 2.0 still support the first version of the protocol, so client drivers that work against C* 1.2 will work with C* 2.0).

The main improvements made in native protocol version 2 are:

  • Batching of prepared statements: this allows to execute individually prepared statements in a BATCH.
  • One-off prepare and execute statements: this allows to pass values for a statement as binary (to avoid a conversion to string for blobs for instance) even when you don’t want to prepare the statement (because you don’t plan on executing that query more than once).
  • Automatic/incremental paging of SELECT statements: With Cassandra 1.2, the result set for a SELECT query is always sent in its entirety, which might lead to query timeout and/or out of memory exceptions (either on the server or client side) if this result set is big. Meaning that users need to be wary of that problem and should always include a LIMIT if the select may yield too much result. This is rather inconvenient and error prone however and the native protocol version 2 fixes that by allowing incremental (and transparent for the user) fetching of a result set.
  • SASL for authentication: The first version of the native protocol has a custom and relatively limited authentication support. As a consequence, it is hard (and very inconvenient if not impossible) to provide secure authentication. Version 2 improves on that situation by replacing this custom mechanism by SASL authentication.
  • More compact result sets for prepared statements

We will describe more precisely (and with examples) those improvements in a follow up blog post so stay tuned.



Comments

  1. DuyHai DOAN says:

    Sorry for the bad url formatting of my previous comment :(

  2. Sylvain Lebresne says:

    > Though, is it possible to mix PS and plain text query in atomic batches ?

    It is, yes.

    > I remember reading somewhere that the impl code of your trigger should be put on the server side isn’t it ?

    That’s correct. I only wanted to mention the trigger syntax here but triggers should probably be the subject of a specific blog post…

  3. cowardlydragon says:

    I’ve read that CQL3 collections actually implement sets/map members as separate columns under the hood but CQL3 doesn’t seem to have queries to get a map’s specific value for a key, or test set membership/existence, which should be efficient column query checks, shouldn’t they?

    Is that coming in Cass 2.0?

  4. cowardlydragon says:

    Will there be CQL enhancements for collections to do queries like:

    - map key / set / list member existence, so we don’t have to download an entire collection to test for existence?
    SELECT exists(setcolumn["setmember"]) FROM table WHERE rowkey = ‘blah’

    - selecting a value or set of values for a key or set of keys?

    SELECT mapcol["key1"], mapcol["key2"] FROM table WHERE rowkey=’blah’

    SELECT listcol[1] FROM table WHERE rowkey = ‘blah’

    Since I believe collection members are implemented as individual sparse columns on the row (I read that somewhere…), efficient querying mechanisms that map to those columns would be nice.

  5. Is there any support for performing IN() on column keys for keyspaces that contain collection fields? (Not performing IN() on a collection field itself, but on the column key.)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>