Sylvain Lebresne

<p>At the time of this writing, Cassandra 2.0.0-beta2 has just been released and it shouldn't be too long before the final lands. In this blog post, we will describe the improvements and new features that C* 2.0 brings on the CQL front. I'll note that none of the following improvements are backward breaking changes of the language, they are only additions to it.</p>

<h2>ALTER DROP</h2>

<p>CQL3 in Cassandra 1.2 does not allow to drop a CQL3 column. This is fixed in Cassandra 2.0, where if you have</p>

<pre>
<code>    CREATE TABLE myTable (
        id text PRIMARY KEY,
        prop1 text,
        prop2 int,
        prop3 float
    )
</code></pre>

<p><br />
then you are allowed to do:</p>

<pre>
<code>    ALTER TABLE myTable DROP prop3;
</code></pre>

<p>&nbsp;</p>

<p>As is expected of such statement, it will drop&nbsp;<tt>prop3</tt>&nbsp;from the table definition but will also remove all data pertaining to that column in the database. This data removal is however performed lazily during compaction (compaction simply looks for dropped columns in the input sstables and doesn't include them in the output; note that if you want to force the removal of dropped columns without waiting for compactions to automatically kick in, you can simply call&nbsp;<tt>nodetool upgradesstables</tt>) and the&nbsp;<tt>ALTER TABLE</tt>&nbsp;statement (that simply updates the table metadata to register the drop) will return quickly.</p>

<h2>Conditional updates</h2>

<p>Cassandra 2.0 introduces some support for&nbsp;<a href="https://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0">lightweight transactions</a>&nbsp;(using Paxos underneath). On the CQL front, this is exposed through the support of the&nbsp;<tt>IF</tt>&nbsp;keyword in&nbsp;<tt>INSERT</tt>,&nbsp;<tt>UPDATE</tt>&nbsp;and&nbsp;<tt>DELETE</tt>&nbsp;statements (the previous&nbsp;<a href="https://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0">blog post</a>&nbsp;on these lightweight transactions provides examples of this support in CQL so we don't repeat them here).</p>

<h2>Conditional schema modifications</h2>

<p>Schema modification statements (<tt>CREATE/DROP KEYSPACE/TABLE/INDEX</tt>) also support a form of conditionals in Cassandra 2.0. This is particularly convenient if at the start of some insertion code you don't know if a keyspace (or table, or index) exists or needs to be created. For that, you can now do</p>

<pre>
<code>    CREATE KEYSPACE IF NOT EXISTS ks
               WITH replication = { 'class': 'SimpleStrategy',
                                    'replication_factor' : 3 };
    CREATE TABLE IF NOT EXISTS test (k int PRIMARY KEY);
</code></pre>

<p><br />
Similarly, you can issue conditional drops:</p>

<pre>
<code>    DROP KEYSPACE IF EXISTS ks;
</code></pre>

<p>&nbsp;</p>

<p>This syntax is merely a convenience: for instance, a (non-conditional)&nbsp;<tt>CREATE KEYSPACE</tt>&nbsp;statement will throw a specific exception if the keyspace already exists. So you can also issue such non-conditional creation and ignore the exception if it is thrown, but this new syntax offers a more concise way to achieve the same effect.</p>

<h2>Triggers</h2>

<p>Cassandra 2.0 also introduces experimental support for triggers and CQL offers a new syntax to register a trigger on a table, namely:</p>

<pre>
<code>    CREATE TRIGGER myTrigger
                ON myTable
             USING 'org.apache.cassandra.triggers.InvertedIndex'
</code></pre>

<p><br />
where&nbsp;<tt>'org.apache.cassandra.triggers.InvertedIndex'</tt>&nbsp;is the Java class implementing the trigger in that example (this class is available at&nbsp;<a href="https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=examples/triggers/src/org/apache/cassandra/triggers/InvertedIndex.java;hb=cassandra-2.0">here</a>&nbsp;if you are interested). It is of course also possible to drop such a trigger:</p>

<pre>
<code>    DROP TRIGGER myTrigger ON myTable
</code></pre>

<p>&nbsp;</p>

<p>Please note however that trigger support is currently experimental. In particular, while the CQL syntax above is unlikely to change, the Java interface that trigger need to currently implement will probably change.</p>

<h2>Secondary indexes on PRIMARY KEY columns</h2>

<p>In Cassandra 1.2, you can only create secondary indexes on CQL3 columns that are not part of the&nbsp;<tt>PRIMARY KEY</tt>&nbsp;definition. In other words, the following does not work in C* 1.2 but will in C* 2.0:</p>

<pre>
<code>    CREATE TABLE timeline (
        event_id uuid,
        week_in_year int,
        created_at timeuuid,
        content blob,
        PRIMARY KEY ((event_id, week_in_year), created_at)
    );
    
    -- Invalid in Cassandra 1.2 but not in 2.0
    CREATE INDEX ON timeline (week_in_year);
</code></pre>

<p>&nbsp;</p>

<p>Let me remark however that secondary indexing of collection columns is still not yet supported in C* 2.0. This will come later.</p>

<h2>Aliases in SELECT</h2>

<p>CQL supports a number of function calls on the column names selected by a&nbsp;<tt>SELECT</tt>. For instance, in cqlsh you can do:</p>

<pre>
<code>cqlsh:ks&gt; SELECT event_id, dateOf(created_at), blobAsText(content) 
            FROM timeline;

 event_id                | dateOf(created_at)       | blobAsText(content)
-------------------------+--------------------------+----------------------
 550e8400-e29b-41d4-a716 | 2013-07-26 10:44:33+0200 | Something happened!?
</code></pre>

<p><br />
While this is fine, it might not be convenient in practice to refer to a column named&nbsp;<tt>dateOf(created_at)</tt>&nbsp;in the result set, so a new alias feature as been added and you can now do:</p>

<pre>
<code>cqlsh:ks&gt; SELECT event_id, 
                 dateOf(created_at) AS creation_date,
                 blobAsText(content) AS content 
            FROM timeline;

 event_id                | creation_date            | content
-------------------------+--------------------------+----------------------
 550e8400-e29b-41d4-a716 | 2013-07-26 10:44:33+0200 | Something happened!?
</code></pre>

<p>&nbsp;</p>

<h2>Preparing timestamp, ttl and limit</h2>

<p>Cassandra 1.2 doesn't allow you to use a bind marker for the&nbsp;<tt>TIMESTAMP</tt>&nbsp;and&nbsp;<tt>TTL</tt>&nbsp;properties of update statements, nor for the&nbsp;<tt>LIMIT</tt>&nbsp;property of&nbsp;<tt>SELECT</tt>&nbsp;statements. This is now fixed and you can for instance prepare statements like:</p>

<pre>
<code>    SELECT * FROM myTable LIMIT ?;
    UPDATE myTable USING TTL ? SET v = 2 WHERE k = 'foo';
</code></pre>

<p>&nbsp;</p>

<h2>Native protocol improvements</h2>

<p>On top of the improvements to the CQL language described above, Cassandra 2.0 introduces the 2nd version of the native protocol for CQL, which brings a number of improvements over the first version (Let's note that client drivers will need to be modified to support that new protocol version (and its improvements) but that Cassandra 2.0 still support the first version of the protocol, so client drivers that work against C* 1.2 will work with C* 2.0).</p>

<p>The main improvements made in native protocol version 2 are:</p>

<ul>
	<li><a href="https://issues.apache.org/jira/browse/CASSANDRA-4693">Batching of prepared statements</a>: this allows to execute individually prepared statements in a&nbsp;<tt>BATCH</tt>.</li>
	<li><a href="https://issues.apache.org/jira/browse/CASSANDRA-5349">One-off prepare and execute statements</a>: this allows to pass values for a statement as binary (to avoid a conversion to string for blobs for instance) even when you don't want to prepare the statement (because you don't plan on executing that query more than once).</li>
	<li><a href="https://issues.apache.org/jira/browse/CASSANDRA-4415">Automatic/incremental paging of SELECT statements</a>: With Cassandra 1.2, the result set for a&nbsp;<tt>SELECT</tt>&nbsp;query is always sent in its entirety, which might lead to query timeout and/or out of memory exceptions (either on the server or client side) if this result set is big. Meaning that users need to be wary of that problem and should always include a&nbsp;<tt>LIMIT</tt>&nbsp;if the select may yield too much result. This is rather inconvenient and error prone however and the native protocol version 2 fixes that by allowing incremental (and transparent for the user) fetching of a result set.</li>
	<li><a href="https://issues.apache.org/jira/browse/CASSANDRA-5545">SASL for authentication</a>: The first version of the native protocol has a custom and relatively limited authentication support. As a consequence, it is hard (and very inconvenient if not impossible) to provide secure authentication. Version 2 improves on that situation by replacing this custom mechanism by SASL authentication.</li>
	<li><a href="https://issues.apache.org/jira/browse/CASSANDRA-5649">More compact result sets for prepared statements</a></li>
</ul>

<p>We will describe more precisely (and with examples) those improvements in a follow up blog post so stay tuned.</p>


CQL improvements in Cassandra 2.0

Sylvain Lebresne

Share

Share

ALTER DROP

Conditional updates

Conditional schema modifications

Triggers

Secondary indexes on PRIMARY KEY columns

Aliases in SELECT

Preparing timestamp, ttl and limit

Native protocol improvements

More Technology

How to Build a Crystal Image Search App with Vector Search

Knowledge Graphs for RAG without a GraphDB

How Winweb Built its AI Assistant with DataStax Astra DB and LangChain

Vercel + Astra DB: Get Data into Your GenAI Apps Fast

One-stop Data API for Production GenAI