DataStax Developer Blog

What’s New in Cassandra 0.8, Part 1: CQL, the Cassandra Query Language

By Jonathan Ellis -  June 9, 2011 | 11 Comments

Why CQL?

Pedantic readings of the NoSQL label aside, Cassandra has never been against SQL per se. SQL is the original data DSL, and quite good at what it does.

Cassandra originally went with a Thrift RPC-based API as a way to provide a common denominator that more idiomatic clients could build upon independently. However, this worked poorly in practice: raw Thrift is too low-level to use productively, and keeping pace with new API methods to support (for example) indexes in 0.7 or distributed counters in 0.8 is too much for many maintainers to keep pace with.

CQL, the Cassandra Query Language, addresses this by pushing all implementation details to the server; all the client has to know for any operation is how to interpret “resultset” objects. So adding a feature like counters just requires teaching the CQL parser to understand “column + N” notation; no client-side changes are necessary.

CQL drivers are also hosted in-tree, to avoid the problems caused in the past by client proliferation.

At the same time, CQL is heavily based on SQL–close to a subset of SQL, in fact, which is a big win for newcomers: everyone knows what “SELECT * FROM users” means. CQL is the first step to making the learning curve on the client side as gentle as it is operationally.

(The place where CQL isn’t a strict subset of SQL–besides TABLE vs COLUMNFAMILY tokens, which may change soon–is in support for Cassandra’s wide rows and heirarchical data like supercolumns.)

A taste of CQL

Here we’ll use the cqlsh tool to create a “users” columnfamily, add some users and an index, and do some simple queries. (See our documentation for details on installing cqlsh.) Compare to the same example using the old cli interface if you’re so inclined.


cqlsh> CREATE KEYSPACE test with strategy_class = 'SimpleStrategy' and strategy_options:replication_factor=1;
cqlsh> USE test;

cqlsh> CREATE COLUMNFAMILY users (
   ...     key varchar PRIMARY KEY,
   ...     full_name varchar,
   ...     birth_date int,
   ...     state varchar
   ... );

cqlsh> CREATE INDEX ON users (birth_date);
cqlsh> CREATE INDEX ON users (state);

cqlsh> INSERT INTO users (key, full_name, birth_date, state) VALUES ('bsanderson', 'Brandon Sanderson', 1975, 'UT');
cqlsh> INSERT INTO users (key, full_name, birth_date, state) VALUES ('prothfuss', 'Patrick Rothfuss', 1973, 'WI');
cqlsh> INSERT INTO users (key, full_name, birth_date, state) VALUES ('htayler', 'Howard Tayler', 1968, 'UT');

cqlsh> SELECT key, state FROM users;
        key | state |
 bsanderson |    UT |
  prothfuss |    WI |
    htayler |    UT |

cqlsh> SELECT * FROM users WHERE state='UT' AND birth_date > 1970;
        KEY | birth_date |         full_name | state |
 bsanderson |       1975 | Brandon Sanderson |    UT |

Current status and the road ahead

cqlsh (included with the Python driver) is already useful for testing queries against your data and managing your schema. The above example works as written in 0.8.0; ALTER COLUMNFAMILY, TTL support, and counter support are already done for 0.8.1.

Supercolumn support is the main missing feature and will probably come in 0.8.2 (minor releases are done roughly monthly). Removing Thrift as a requirement is a longer-term goal.

CQL does not change the underlying Cassandra data model; in particular, there is no support for JOINs. (For doing analytical queries with SQL against Cassandra, see DataStax Enterprise.)

The Python and Node.js CQL drivers are ready for wider use; JDBC and Twisted are almost done, and PHP and Ruby are being worked on.

In short, CQL is ready for client authors to get involved. Most application developers should stick with the old Thrift RPC-based clients until 0.8.1*. If you want to give CQL a try, see the installation instructions, and for developers, the drivers tree was recently moved here.

*One such client, Hector, already has CQL query support.



Comments

  1. Thank you very much for the introduction to the new (awesome) CQL! I just started using Cassandra and I really like it

  2. Senthil Ramaswamy says:

    This is great. I tried to add a new DataConnection in eclipse using the cql jdbc driver in eclipse. It crashed, hopefully it will work with data and reporting tools with JDBC metadata soon

  3. Jonathan Ellis says:

    Yes, that should work a lot better with the next JDBC driver release.

  4. Nilabja says:

    Really awesome…

  5. I like CQL! Great job! Keep up the good work.

    We will implement application logging on Cassandra with CQL.

  6. Sridhar says:

    Thanks for posting this. I was looking for way to filter the rows in cassandra database based on the column values. Can this be done using Hector API?

  7. dedou says:

    Cassandra is strongly recommended in the analytical process, this is why I want to test operations on aggregations of data I loaded into cassandra. But CQL does not know the sum and average function and all the aggregation operations

  8. Jonathan Ellis says:

    CQL is designed to address simple real-time queries, not analytical ones. Aggregation, joins, and subqueries are all not supported.

    Full analytical SQL is supported via DataStax Enterprise. See http://www.datastax.com/products/enterprise and http://www.datastax.com/wp-content/uploads/2011/09/WP-DataStax-Enterprise.pdf for more details.

  9. Ansar Rafique says:

    I am looking for CQL JDBC Jar file. Where I can download this jar ?

  10. Vyjayanthi says:

    Can some one help!
    How to use CQl 3.0 with JDBC drivers available for cassandra

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>