What’s New in Cassandra 0.8, Part 1: CQL, the Cassandra Query Language
Cassandra originally went with a Thrift RPC-based API as a way to provide a common denominator that more idiomatic clients could build upon independently. However, this worked poorly in practice: raw Thrift is too low-level to use productively, and keeping pace with new API methods to support (for example) indexes in 0.7 or distributed counters in 0.8 is too much for many maintainers to keep pace with.
CQL, the Cassandra Query Language, addresses this by pushing all implementation details to the server; all the client has to know for any operation is how to interpret “resultset” objects. So adding a feature like counters just requires teaching the CQL parser to understand “column + N” notation; no client-side changes are necessary.
CQL drivers are also hosted in-tree, to avoid the problems caused in the past by client proliferation.
At the same time, CQL is heavily based on SQL–close to a subset of SQL, in fact, which is a big win for newcomers: everyone knows what “SELECT * FROM users” means. CQL is the first step to making the learning curve on the client side as gentle as it is operationally.
A taste of CQL
Here we’ll use the cqlsh tool to create a “users” columnfamily, add some users and an index, and do some simple queries. (See our documentation for details on installing cqlsh.) Compare to the same example using the old cli interface if you’re so inclined.
cqlsh> CREATE KEYSPACE test with strategy_class = 'SimpleStrategy' and strategy_options:replication_factor=1;
cqlsh> USE test;
cqlsh> CREATE COLUMNFAMILY users (
... key varchar PRIMARY KEY,
... full_name varchar,
... birth_date int,
... state varchar
cqlsh> CREATE INDEX ON users (birth_date);
cqlsh> CREATE INDEX ON users (state);
cqlsh> INSERT INTO users (key, full_name, birth_date, state) VALUES ('bsanderson', 'Brandon Sanderson', 1975, 'UT');
cqlsh> INSERT INTO users (key, full_name, birth_date, state) VALUES ('prothfuss', 'Patrick Rothfuss', 1973, 'WI');
cqlsh> INSERT INTO users (key, full_name, birth_date, state) VALUES ('htayler', 'Howard Tayler', 1968, 'UT');
cqlsh> SELECT key, state FROM users;
key | state |
bsanderson | UT |
prothfuss | WI |
htayler | UT |
cqlsh> SELECT * FROM users WHERE state='UT' AND birth_date > 1970;
KEY | birth_date | full_name | state |
bsanderson | 1975 | Brandon Sanderson | UT |
Current status and the road ahead
cqlsh (included with the Python driver) is already useful for testing queries against your data and managing your schema. The above example works as written in 0.8.0; ALTER COLUMNFAMILY, TTL support, and counter support are already done for 0.8.1.
Supercolumn support is the main missing feature and will probably come in 0.8.2 (minor releases are done roughly monthly). Removing Thrift as a requirement is a longer-term goal.
The Python and Node.js CQL drivers are ready for wider use; JDBC and Twisted are almost done, and PHP and Ruby are being worked on.
In short, CQL is ready for client authors to get involved. Most application developers should stick with the old Thrift RPC-based clients until 0.8.1*. If you want to give CQL a try, see the installation instructions, and for developers, the drivers tree was recently moved here.
*One such client, Hector, already has CQL query support.