DataStax Developer Blog

Coming in Cassandra 1.2: binary CQL protocol

By Sylvain Lebresne - November 21, 2012 | 9 Comments

In Cassandra 0.8, the first version of the Cassandra Query Language has been introduced as an alternative to the Thrift API (so-called because based on Apache Thrift). In Cassandra 1.2, the final version for the third revision of CQL will be released with a number of features that we believe will provide a simpler API for Cassandra.

So far, CQL has still been using Thrift as a network transport. This was done initially out of convenience, because we wanted to focus on the language first, and Thrift was there, provided us a transport for free and is relatively fast. But CQL is in no way tied to Thrift for the transport, and while Thrift has advantages, it also comes with a few limitations:

  • it is strictly an RPC mechanism. You cannot have the server push informations to the client for instance.
  • it is a synchronous transport. You can only have one active request per connection at a given time.
  • while Thrift comes with support for a fair amount of languages, not every language are supported and our experience has shown that not all languages support were equal in term of stability and performance. It does make Cassandra client-side languages support directly tied to Thrift, which is not always a good thing (adding support for a new language in Thrift is for instance much more involved than simply writing support for the new protocol described in this post).

Also, Thrift is a generic framework, and we believe that a transport specifically tailored for Cassandra might bring additional control and maybe performance.

This has led to the new binary protocol that will be introduced by Cassandra 1.2. This protocol is a custom one and has been designed specifically for Cassandra and more precisely for CQL3 (that is, it only support CQL3). Amongst others, it offers the following features:

  • Asynchronous: each connection can handle more than one active request at the same time. In practice, this means that client libraries will only need to maintain a relatively low amount of open connections to a given Cassandra node to achieve good performance. This particularly matters with Cassandra where a client usually wants to keep connection to all (or at least a good part of) the nodes of the Cluster and so having a low number of per-node connections helps scaling to large clusters.
    Technically, this is achieved by giving each messages a stream ID, and by having responses to a request preserve the request’s stream ID. Clients can thus send multiple requests with different stream IDs on the same connection (i.e. without waiting for the response to a request to send the next one) while still being able to associate each received response to the right request, even if said responses comes in a different order than the one in which requests were submitted. That asynchronicity is of course optional in the sense that a client library can still choose to use the protocol in a synchronous way if that is simpler.
  • Server notifications: the protocol allows clients to register for certain types of events notifications. The currently supported events are cluster topology changes (a node join the cluster, is removed, or move), status changes (a node is detected up/down) and schema changes (the schema has been modified). When one of those events occurs, the server will push a notification to the registered clients. This allows those clients to maintain a state of the Cassandra cluster up to date without having to poll the cluster regularly. Obviously, more type of notifications might be added in the future, opening up a number of interesting possibilities.
  • Optional compression: messages of the protocol can be optionally compressed.

Interested parties can find the full specification for this protocol for Cassandra 1.2 here.

So what if you want to give that new protocol a try? First, you need a version of Cassandra 1.2 (at the time of this writing, the most recent release would be Cassandra 1.2.0-beta2, but release candidates should be out in the coming weeks and the final version should be released before the end of the year). Then, you need to activate the binary protocol server. Keep in mind that this protocol and its implementation are brand new. For that reason, the binary protocol server is not started by default (only the thrift server is). You can change that by setting the start_native_transport option to true in cassandra.yaml (you can also turn start_rpc to false if you’re not going to use the thrift interface). Lastly, you need a client driver that support this new protocol. One such driver (that is still beta itself) is the Java Driver that DataStax open-sourced a few days ago. But Cassandra 1.2 haven’t been released yet and more drivers will come shortly.

We believe this new protocol is a good addition to Cassandra and it already offers a number of advantages (asynchronicity, notifications, …) but this is definitively not the end of the road. In the short/medium term, we plan at least to:

  • Benchmark that new protocol: so far, we’ve focused on having a complete and usable protocol. In doing so, we have been careful to design a compact protocol and the server-side implementation of this protocol is based on Netty, which is known to be performant, but we have yet to properly benchmark it.
  • SSL Support
  • Cursor API: currently, as with Thrift, users must be careful to not do requests that return too much data, because in that case everything is buffered server-side and returned to the client in one message. This puts the burden of paging big request on the client however, which is not ideal, and we plan to handle this at the protocol level by adding some form of streaming support to the protocol.

Comments

  1. Hossein Ghiyasi Mehr says:

    That’s OK. But I have two question:
    1. How many requests can each connection handle at the same time in new protocol?
    2. In notification feature if a client add a new column to an existing row, will server notify client about adding new pair?

  2. Sylvain Lebresne says:

    How many requests can each connection handle at the same time in new protocol?

    128 (per connection).

    if a client add a new column to an existing row, will server notify client about adding new pair

    No. Currently we only notify for events that are useful to client libraries (like when a node joins the cluster) but we don’t notify data-related events. But from the protocol point of view, we could allow notification for any kind of events, which means we may have data-related notification in the future.

  3. Shahryar Sedghi says:

    Is there a plan to support the Cassandra JDBC driver with this version? I heard there is new Java API, but that is not JDBC yet I assume.

  4. Sylvain Lebresne says:

    @Shahryar the binary protocol itself is obviously agnostic to any specific API. Concerning JDBC, I don’t think the cassandra-jdbc driver has been updated yet to support the binary protocol. As said above, the only java driver for the binary protocol currently is probably the new Datastax Java driver (that is still in beta). This don’t yet support JDBC but we do intend to provide a JDBC module in the future (which hopefully will be able to reuse the work done on the cassandra-jdbc driver), though I don’t know when that will happen.

  5. Jabbar Azam says:

    I’ve been looking at CQL 3 and I find it quite exciting.

    I’ve used the datastax java driver to talk to Cassandra 1.2. Unfortunately I need a .net driver and I can’t find one.

    Is there a driver available somewhere?

    I am evaluating Cassandra and my colleague is evaluating MySQL cluster. Obviously I prefer Cassandra and I’m going to lose a lot of time developing one myself.

  6. Jabbar Azam says:

    Whoppee I’ve found some .net drivers which are CQL 3 compliant.

    Cassandra Sharp(updated 5 days ago)
    https://github.com/pchalamet/cassandra-sharp

    FluentCassandra(updated 2 days ago)
    https://github.com/managedfusion/fluentcassandra

  7. Raja James says:

    Currently, I use a thrift API(protocol) to run CQL 3 query against Cassandra 1.1.6. I would like to take advantage of a binary protocol supported in Cassanda 1.2. Is there a C++ driver available to run CQL 3 query using binary protocol ?

    If not, what would you recommend me to use to perform asynchronous queries in C++ against Cassandra 1.2 ?

  8. Eugene OZ says:

    In PDO driver for PHP
    http://code.google.com/a/apache-extras.org/p/cassandra-pdo/
    any plans to support binary protocol?

  9. Marcos Trama says:

    I openned an issue for cassandra-pdo because CQL3 composite keys dont work:
    http://code.google.com/a/apache-extras.org/p/cassandra-pdo/issues/detail?id=15

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>