DataStax Developer Blog

What’s New in CQL 3.0

By paul cannon - April 20, 2012 | 25 Comments

Cassandra 1.1 brings with it some important new developments in the CQL modeling and query language. The changes are deep enough, in fact, that they require a whole new version of the CQL language specification. Most of these changes are related to the big push underway to encourage the use of composite keys for simpler denormalization. See the earlier post on schema in Cassandra 1.1 for a quick explanation on how composites can help avoid manual denormalization when you need to lookup on multiple dimensions of your data.

The new version of CQL is numbered 3.0. To remind you, the flavor of CQL we have already come to know and love in previous versions of Cassandra is CQL 2.0. And don’t worry, CQL 2 is still supported in Cassandra 1.1 for backwards compatibility, but we recommend new applications be developed with CQL 3.

To use CQL 3, tell your CQL driver that you intend to use it. The method to do so differs by the driver. For example, in the Python driver, you specify the desired CQL version in the connect() call:

  connection = cql.connect('localhost:9160', cql_version='3.0.0')

The biggest change in CQL 3 is that the CREATE TABLE syntax now allows multi-column primary keys.  As explained in the schema post, this is translated underneath the covers to wide Cassandra rows using composite column names, so it’s becoming more important to understand that storage-engine rows are not the same thing as CQL rows.  If you create and use a CQL 3 table with a compound primary key, then inspect the corresponding column family using the Thrift interface, you will see the differences in layout.

Now, to be more specific, the CQL 3 CREATE TABLE syntax still allows table creation in much the same way as before, for simple keys:

   CREATE TABLE crew (
       name text PRIMARY KEY,
       bunk int,
       alliance_trouble boolean
   );

But now you have the option of using more, using a syntax borrowed from SQL:

   CREATE TABLE altercations (
       instigator text,
       started_at timestamp,
       ships_destroyed int,
       energy_used float,
       alliance_involvement boolean,
       PRIMARY KEY (instigator, started_at)
   );

A CQL 3 table’s primary key can have any number (1 or more) of component columns, but there must be at least one column which is not part of the primary key.

Inserting data into these composite-key tables is, as you might expect, the same as before:

   INSERT INTO crew (name, bunk, alliance_trouble) VALUES ('River Tam', 13, 'true');
   INSERT INTO altercations (instigator, started_at, ships_destroyed,
                             energy_used, alliance_involvement)
                     VALUES ('Jayne Cobb', '7943-07-23', 2, 4.6, 'false');
   UPDATE crew SET bunk=9 WHERE name='Book';
   UPDATE altercations SET energy_used = 2.3
          WHERE instigator='Jubal Early' AND started_at='7943-12-19';

Selecting data from these tables is likewise nearly the same, but there are a few extra requirements in order to help avoid doing things inefficiently.  In CQL 2, WHERE clauses had to operate on the table key alone (or on an indexed column alone).  Obviously, there is now a need for selecting on multiple values at the same time:

   SELECT * FROM altercations WHERE instigator='Jayne Cobb' AND started_at > '7943-08-24';
   SELECT energy_used, ships_destroyed FROM altercations
          WHERE instigator='Malcolm Reynolds' ORDER BY started_at DESC;

Notice that WHERE clauses can now include greater-than/less-than comparisons on columns other than the first. As long as all previous key-component columns have already been identified with strict = comparisons, the last given key component column can be any sort of comparison.

Composite keys means it now makes sense for CQL to sport the ORDER BY syntax in SELECT queries as well, but it’s still not nearly as flexible as you might be used to, doing ad-hoc queries in SQL.  ORDER BY clauses can only select a single column, and that column has to be the second column in a composite PRIMARY KEY. This holds even for tables with more than 2 column components in the primary key. Ordering can be done in ascending or descending order, default ascending, and specified with the ASC or DESC keywords.

An important change that comes with this new emphasis on component-key-based denormalization is that direct access to the storage engine via column ranges is no longer supported in CQL 3. CQL 3 allows performing those queries with normal predicates in the WHERE clause instead, operating on logical columns. You just need to represent your wide-row data in a way that can fit a CQL 3 table (this should be possible for anything you have already modeled for Cassandra). All of this means there is no column range operator “..”, and no FIRST or REVERSED clauses for column selection in SELECT queries. They’re not necessary anymore.

Here’s an example of this sort of adjustment to a data model. Suppose Captain Reynolds keeps track of every ship registered by his sensors as he flies through space. Using CQL 2, he might have had a column family like this:

   CREATE TABLE seen_ships (day text PRIMARY KEY)
                WITH comparator=timestamp AND default_validation=text;
   INSERT INTO seen_ships (day, '7943-02-02 00:08:22')
                   VALUES ('199-A/4', 'Sunrise Avenger');
   SELECT FIRST 12 '7943-02-03'..'7943-02-28' FROM seen_ships WHERE day='199-A/4';

So each day-code is a Cassandra row, and events are added in with timestamps. Rows could become extremely wide, depending on how bad traffic is in space that day. One simple way to adjust this to CQL 3 would be something like:

   CREATE TABLE seen_ships (
       day text,
       time_seen timestamp,
       shipname text,
       PRIMARY KEY (day, time_seen)
   );
   SELECT * FROM seen_ships WHERE day='199-A/4'
            AND time_seen > '7943-02-03' AND time_seen < '7943-02-28'
            LIMIT 12;

Notice there shouldn’t be a need to set a custom comparator or default_validation class in CQL 3 either.

If you’re paying close attention and also quite space-conscious, you may have noticed that while this alternative may work for Captain Reynolds, it’s going to cost more in space than it would have under the CQL 2 approach—since for every piece of data stored, the column name needs to be stored along with it. This is the reason for the WITH COMPACT STORAGE directive discussed in the already-mentioned schema blog. With this addition:

   CREATE TABLE seen_ships (
       day text,
       time_seen timestamp,
       shipname text,
       PRIMARY KEY (day, time_seen)
   ) WITH COMPACT STORAGE;

Mal’s data will be stored essentially the same as it was under CQL 2. One downside is that it will be trickier to add or remove columns to the model with this approach (in fact, support for adding columns to COMPACT STORAGE tables is not yet ready, and it may never be worth adding support for deleting them).

That pretty much covers the major additions to CQL.  There are some smaller changes as well, but not very many. For example, column names which don’t parse as valid CQL identifiers now need to be enclosed in double quotes (in CQL 2, they had to be enclosed in single quotes, just like string literals). But we’re probably past the point here where it’s interesting enough for a blog post.

CQL 3 may not yet be as well supported as CQL 2 by Cassandra-related tools and drivers (I’m looking at you, cqlsh!) but at least minimal support should exist in all major tools and drivers, and ubiquitous first-class support is a near-term goal.

EDIT 2012-04-21 15:07 UTC – clarified that no functionality has been removed in CQL 3.
EDIT 2012-04-21 15:11 UTC – changed claim about possibility of adding columns to COMPACT STORAGE- apparently support is in the works
EDIT 2012-04-23 15:58 UTC – fixed link to previous schema blog post
EDIT 2012-04-30 21:25 UTC – fixed argument to python cql.connect() function for getting CQL 3.
EDIT 2012-08-19 04:31 UTC – quoted “true” and “false” literals to be valid modern CQL 3.


Comments

  1. Nagaraj J says:

    Congrats. The feature set is really get especially the order by feature.

    While going through the code in both cql/cql3 , i realised, non-indexed columns are not supported in where clause even in presence of primary key.

    For ex:

    select * from cf where key=’ ‘ and non-indexed-col=’ ‘ results in

    “No indexed columns present in by-columns clause with Equal operator”

    But i have the primary key in the where clause . Hence not sure why i need a indexed column in there.

    Any particular reason for this behavior . I feel with wide rows this use-case will arise more often than before.

    Thanks
    Nagaraj

  2. paul cannon says:

    @Nagaraj: CQL makes an effort not to allow queries which would be prohibitively inefficient or meaningless given the underlying data model. Your example query falls into the latter category, since once you have selected a particular key value, there is no possible further selectivity (primary keys are unique). So:

    select * from cf where key=’ ‘;

    ..either results in 1 or 0 rows. It’s simple to determine in the client whether non_indexed_col has the value you want or not, and take the appropriate action.

    This is one of the disconnects between CQL and SQL: in SQL, it’s usually good practice to move as much logic as you can into the database queries. CQL is only as expressive as necessary to be able to use the Cassandra data model in a more familiar way.

  3. Nagaraj J says:

    @Paul Thanks for the explanation that helps a lot.

    I am just thinking with CQL 3 where Partition key is one component of a composite there can be more than 1 rows. Hence thought this might be useful the way this is supported with secondary indicies in cql 2. However filtering on client side isn’t that costly as you said.

    Secondary index support for composite is what we are looking forward to.

  4. paul cannon says:

    @Nagaraj: Yes, definitely it makes sense to select on columns which are part of a composite key, and you’ll notice that works fine.

    create table foo (a text, b text, c text, primary key (a, b));
    select * from foo where a = ‘x’ and b = ‘y’; — works

  5. It’s a great step to easy of use. Good work!

    When will the CQL 3.0 document be published? The Reversed Comparators cannot be created in cql3:

    REM (reversed=true),

    create columnfamily NoCache_Descending
    (id int PRIMARY KEY)
    with comparator = ‘UTF8Type(reversed=true)’
    and default_validation=varchar
    ;

    Thanks,
    Charlie | DBA

  6. Enable CQL 3.0 in Python:
    conn_cass = cql.connect(host=’localhost’, port=9160, keyspace=’demo’, …, cql_version=’3.0′)

  7. Enable CQL 3.0 in Python: cql_version=’3.0.0′

  8. paul cannon says:

    Charlie- good call, the number needs to be ’3.0.0′. I forget, since cqlsh accepts shorter version numbers and pads as needed. I’ll fix that in the article.

    Datastax’s CQL 3 documentation is under final review and will be released soon, as part of the Cassandra 1.1 documentation, I believe.

    As for Reversed, you’re right, CQL 3 does not support it yet. It will soon, though. See https://issues.apache.org/jira/browse/CASSANDRA-4004 to keep track of the status. In the meantime, perhaps ORDER BY … DESC will get you what you need.

  9. Tamil Selvan says:

    Why is there no mention about updating Composite row or column keys? Like..
    key_validation_class = ‘CompositeType(UTF8Type, UTF8Type)’
    and comparator = ‘CompositeType(UTF8Type, UTF8Type)’
    Does this version of CQL support this feature?

  10. paul cannon says:

    Tamil- A columnfamily’s comparator should not ever be changed, since that would make all on-disk ordering become invalid. This includes other columns which are part of a composite primary key; changing them would change the comparator. The key_validation_class can be changed through CQL, though—the one affecting the actual partitioning key = the first member of a composite key—using the ALTER TABLE command.

  11. Roland Mechler says:

    I’m having trouble deleting a row from a table with composite keys. Am I doing something wrong?

    cqlsh:keyspace1> CREATE TABLE MyTable (part1 text, part2 text, data text, PRIMARY KEY(part1, part2));
    cqlsh:keyspace1> INSERT INTO MyTable (part1, part2, data) VALUES(‘a’, ‘b’, ‘c’);
    cqlsh:keyspace1> SELECT * FROM MyTable WHERE part1 = ‘a’ AND part2 = ‘b’;
    part1 | part2 | data
    ——-+——-+——
    a | b | c

    cqlsh:keyspace1> DELETE FROM MyTable WHERE part1 = ‘a’ AND part2 = ‘b’;
    cqlsh:keyspace1> SELECT * FROM MyTable WHERE part1 = ‘a’ AND part2 = ‘b’;
    part1 | part2 | data
    ——-+——-+——
    a | b | c

  12. Roland Mechler says:

    Oh, it works if you specify the column(s) to delete:

    cqlsh:keyspace1> DELETE data FROM MyTable WHERE part1 = ‘a’ AND part2 = ‘b’;
    cqlsh:keyspace1> SELECT * FROM MyTable WHERE part1 = ‘a’ AND part2 = ‘b’;
    cqlsh:keyspace1>

    I would have expected DELETE FROM to select all columns for deletion. Is the behaviour I’m seeing intentional?

  13. the paul says:

    Roland- you’re seeing https://issues.apache.org/jira/browse/CASSANDRA-4193 in action there. You’ve got a composite-key columnfamily, and row deletes there aren’t as straightforward as existing support for row tombstones and column tombstones (since the CQL rows deleted do not correspond exactly to storage-engine rows).

    This effect should be fixed in the next version, but in the meantime, you’ve hit upon the right workaround.

  14. Roland Mechler says:

    Thanks for the info, Paul. I came across a slightly different issue. Similar scenario, but now I have a 3 part composite key and I want to delete rows based on the first 2 parts of the key. SELECT works using 2 parts of the key, but DELETE fails with the error:

    Bad Request: Missing mandatory PRIMARY KEY part part3

    (see details below). Is there a reason why deleting based on the first 2 parts should not work? I.e., is it just currently not supported, or is it a permanent limitation?

    Note that deleting based on just the first part of the key will work… deletes all matching rows.

    cqlsh:Keyspace1> CREATE TABLE MyTable (part1 text, part2 text, part3 text, data text, PRIMARY KEY(part1, part2, part3));
    cqlsh:Keyspace1> INSERT INTO MyTable (part1, part2, part3, data) VALUES (‘a’, ‘b’, ‘c’, ‘d’);
    cqlsh:Keyspace1> SELECT * FROM MyTable WHERE part1 = ‘a’ AND part2 = ‘b’;
    part1 | part2 | part3 | data
    ——-+——-+——-+——
    a | b | c | d

    cqlsh:Keyspace1> DELETE FROM MyTable WHERE part1 = ‘a’ AND part2 = ‘b’;
    Bad Request: Missing mandatory PRIMARY KEY part part3
    cqlsh:Keyspace1> DELETE data FROM MyTable WHERE part1 = ‘a’ AND part2 = ‘b’;
    Bad Request: Missing mandatory PRIMARY KEY part part3
    cqlsh:Keyspace1> DELETE FROM MyTable WHERE part1 = ‘a’;
    cqlsh:Keyspace1> SELECT * FROM MyTable WHERE part1 = ‘a’ AND part2 = ‘b’;
    cqlsh:Keyspace1>

  15. “deleting based on just the first part of the key(that is the partition key) will work” is interesting.

    delete by leading part of the composite key not working is reasonable based on Cassandra wide row physical data model. It could be a new feature if it will work in future release. ^_^

    Thanks,
    Charlie

  16. Tamil says:

    Hi paul cannon,
    I just asked for the syntax of using composite keys in select, update and insert statements

  17. Vyjayanthi says:

    Can some one help!
    How to use CQL 3.0 with JDBC drivers or Hector

  18. AndyB says:

    We’re keen to give this a go via PHP. Is there any timescale yet on an update to pdo-cassandra to allow us to select CQL 3? (It defaults to 2, which doesn’t have composite keys).

    Thanks,
    Andy

  19. AndyB says:

    I tried the examples on this page but got an odd error (using:
    [cqlsh 2.2.0 | Cassandra 1.1.2 | CQL spec 3.0.0 | Thrift protocol 19.32.0]
    )

    This works ok:

    cqlsh:test> CREATE TABLE altercations (
    … instigator text,
    … started_at timestamp,
    … ships_destroyed int,
    … energy_used float,
    … alliance_involvement boolean,
    … PRIMARY KEY (instigator, started_at)
    … );

    …but this throws an error:

    cqlsh:test> INSERT INTO altercations (instigator, started_at, ships_destroyed,energy_used, alliance_involvement) VALUES (‘Jayne Cobb’, ’7943-07-23′, 2, 4.6, false);

    The error is:

    Bad Request: line 3:65 no viable alternative at input ‘false’

    I can’t see what the problem is in the example – it looks like it should work.

    Anyway, I tried putting false in single quotes, and that worked! (Surely it shouldn’t, because boolean should expect a boolean value not a string…!)

    Anyway, in case anyone discovered this too, here’s what worked:

    cqlsh:test> insert into altercations (instigator, started_at, ships_destroyed, energy_used, alliance_involvement) values (‘Jayne Cobb’, ’7943-07-23′, 2, 4.6, ‘false’);

    And here’s the data:

    cqlsh:test> select * from altercations;
    instigator | started_at | alliance_involvement | energy_used | ships_destroyed
    ————+————————–+———————-+————-+—————–
    Jayne Cobb | 7943-07-23 00:00:00+0000 | False | 4.6 | 2

    Note how the value of the boolean isn’t the string I passed (which was ‘false’) but False, which gives me confidence that under the hood it actually has stored a boolean.

  20. paul cannon says:

    @AndyB: you’re right, somewhere along the line CQL3 must have stopped recognizing ‘true’ and ‘false’ as valid keywords/identifiers. Possibly they were never meant to be such; CQL2 was much too lenient (imo) in allowing unquoted words to be used as string literals where values are expected.

    Just quote your ‘true’s and ‘false’s, and everything will be good. It’s the same situation with PostgreSQL- and MySQL-flavored SQL, iirc.

  21. paul cannon says:

    (I edited the post to reflect that. Thanks, AndyB)

  22. Jose Walker says:

    What if I store JSON or XML within a Column!? and in the WHERE clause I would like to query for an specific value of that JSON or XML? I’ve been looking all around whether there is any function within CQL that might help to accomplish something like that. How difficult is to extend CQL in order to incorporate something like this!? I pretty much guess that not everybody stores just simple types (string, int, bool, etc) right!? I Need a bit of help and light in here…

    Regards,

    Jose.

  23. paul cannon says:

    Jose- it’s easy to query for a specific value of a column, no matter what you’ve stored in it (WHERE COLUMN=’‘). But I think that you mean querying for specific values or attributes /within/ the xml/json. If that’s the case, Cassandra doesn’t do that on its own. It’s optimized for super-fast and distributable row-key/column lookups, and not for complex analysis of your stored data.

    You may want to look at DataStax Enterprise’s Solr integration, though: http://www.datastax.com/docs/datastax_enterprise2.1/search/index . It may meet your needs.

  24. paul cannon says:

    Anyone else coming by here for CQL usage answers: you will have much better luck getting timely and informed responses by asking your question on the user mailing list, on IRC, in the DataStax forums, or on Stack Overflow (tagged “cql”). This isn’t a very good place.

    Stack Overflow: http://stackoverflow.com/questions/tagged/cql

    Mailing lists: http://wiki.apache.org/cassandra/FrontPage#line-55

    IRC: http://wiki.apache.org/cassandra/IRC

    Datastax forums: http://www.datastax.com/support-forums/

  25. Ed says:

    How to update CQL to 3.0 ?

    Our Current version is
    [cqlsh 2.2.0 | Cassandra 1.1.4 | CQL spec 2.0.0 | Thrift protocol 19.32.0]

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>