Apache Cassandra 1.0 Documentation

CQL Language Reference

This document corresponds to an earlier product version. Make sure you are using the version that corresponds to your version.

Latest Cassandra documentation | Earlier Cassandra documentation

Cassandra Query Language (CQL) is based on SQL (Structured Query Language), the standard for relational database manipulation. Although CQL has many similarities to SQL, there are some fundamental differences. For example, CQL is adapted to the Cassandra data model and architecture so there is still no allowance for SQL-like operations such as JOINs or range queries over rows on clusters that use the random partitioner. This reference describes CQL 2.0.0.

CQL Lexical Structure

CQL input consists of statements. Like SQL, statements change data, look up data, store data, or change the way data is stored. All statements end in a semicolon (;).

For example, the following is valid CQL syntax:

SELECT * FROM MyColumnFamily;

UPDATE MyColumnFamily SET 'SomeColumn' = 'SomeValue' WHERE KEY = B70DE1D0-9908-4AE3-BE34-5573E5B09F14;

This is a sequence of two CQL statements. This example shows one statement per line, although a statement can usefully be split across lines as well.

CQL Identifiers and Keywords

String literals and identifiers, such as keyspace and column family names, are case-sensitive. For example, identifier MyColumnFamily and mycolumnfamily are not equivalent. CQL keywords are case-insensitive. For example, the keywords SELECT and select are equivalent, although this document shows keywords in uppercase.

Valid expressions consist of these kinds of values:

  • identifier: A letter followed by any sequence of letters, digits, or the underscore.
  • string literal: Characters enclosed in single quotation marks. To use a single quotation mark itself in a string literal, escape it using a single quotation mark. For example, ''.
  • integer: An optional minus sign, -, followed by one or more digits.
  • uuid: 32 hex digits, 0-9 or a-f, which are case-insensitive, separated by dashes, -, after the 8th, 12th, 16th, and 20th digits. For example: 01234567-0123-0123-0123-0123456789ab
  • float: A series of one or more decimal digits, followed by a period, ., and one or more decimal digits. There is no provision for exponential, e, notation, no optional + sign, and the forms .42 and 42. are unacceptable. Use leading or trailing zeros before and after decimal points. For example, 0.42 and 42.0.
  • whitespace: Separates terms and used inside string literals, but otherwise CQL ignores whitespace.

CQL Data Types

Cassandra has a schema-optional data model. You can define data types when you create your column family schemas. Creating the schema is recommended, but not required. Column names, column values, and row key values can be typed in Cassandra.

CQL comes with the following built-in data types, which can be used for column names and column/row key values. One exception is counter, which is allowed only as a column value (not allowed for row key values or column names).

CQL Type Description
ascii US-ASCII character string
bigint 64-bit signed long
blob Arbitrary bytes (no validation), expressed as hexadecimal
boolean true or false
counter Distributed counter value (64-bit long)
decimal Variable-precision decimal
double 64-bit IEEE-754 floating point
float 32-bit IEEE-754 floating point
int 32-bit signed integer
text UTF-8 encoded string
timestamp Date plus time, encoded as 8 bytes since epoch
uuid Type 1 or type 4 UUID
varchar UTF-8 encoded string
varint Arbitrary-precision integer

In addition to the CQL types listed in the previous table, you can use a string containing the name of a class (a sub-class of AbstractType loadable by Cassandra) as a CQL type. The class name should either be fully qualified or relative to the org.apache.cassandra.db.marshal package.

Working with Dates and Times

Values serialized with the timestamp type are encoded as 64-bit signed integers representing a number of milliseconds since the standard base time known as the epoch: January 1 1970 at 00:00:00 GMT.

Timestamp types can be input in CQL as simple long integers, giving the number of milliseconds since the epoch.

Timestamp types can also be input as string literals in any of the following ISO 8601 formats:

yyyy-MM-DD HH:mm
yyyy-MM-DD HH:mm:ss
yyyy-MM-DD HH:mmZ
yyyy-MM-DD HH:mm:ssZ
yyyy-MM-DD'T'HH:mm
yyyy-MM-DD'T'HH:mmZ
yyyy-MM-DD'T'HH:mm:ss
yyyy-MM-DD'T'HH:mm:ssZ
yyyy-MM-DD
yyyy-MM-DDZ

For example, for the date and time of Jan 2, 2003, at 04:05:00 AM, GMT:

2011-02-03 04:05+0000
2011-02-03 04:05:00+0000
2011-02-03T04:05+0000
2011-02-03T04:05:00+0000

The +0000 is the RFC 822 4-digit time zone specification for GMT. US Pacific Standard Time is -0800. The time zone may be omitted. For example:

2011-02-03 04:05
2011-02-03 04:05:00
2011-02-03T04:05
2011-02-03T04:05:00

If no time zone is specified, the time zone of the Cassandra coordinator node handing the write request is used. For accuracy, DataStax recommends specifying the time zone rather than relying on the time zone configured on the Cassandra nodes.

If you only want to capture date values, the time of day can also be omitted. For example:

2011-02-03
2011-02-03+0000

In this case, the time of day defaults to 00:00:00 in the specified or default time zone.

CQL Comments

Comments can be used to document CQL statements in your application code. Single line comments can begin with a double dash (--) or a double slash (//) and extend to the end of the line. Multi-line comments can be enclosed in /* and */ characters.

Specifying Consistency Level

In Cassandra, consistency refers to how up-to-date and synchronized a row of data is on all of its replica nodes. For any given read or write operation, the client request specifies a consistency level, which determines how many replica nodes must successfully respond to the request.

In CQL, the default consistency level is ONE. You can set the consistency level for any read (SELECT) or write (INSERT, UPDATE, DELETE, BATCH) operation. For example:

SELECT * FROM users WHERE state='TX' USING CONSISTENCY QUORUM;

Consistency level specifications are made up the keywords @USING CONSISTENCY@, followed by a consistency level identifier. Valid consistency level identifiers are:

  • ANY (applicable to writes only)
  • ONE (default)
  • QUORUM
  • LOCAL_QUORUM (applicable to multi-data center clusters only)
  • EACH_QUORUM (applicable to multi-data center clusters only)
  • ALL

See tunable consistency for more information about the different consistency levels.

CQL Storage Parameters

Certain CQL commands allow a WITH clause for setting certain properties on a keyspace or column family. CQL does not currently offer support for defining all of the possible properties, just a subset.

CQL Keyspace Storage Parameters

CQL supports setting the following keyspace properties.

  • strategy_class The name of the replication strategy: SimpleStrategy or NetworkTopologyStrategy
  • strategy_options Replication strategy option names are appended to the strategy_options keyword using a colon (:). For example: strategy_options:DC1='1' or strategy_options:replication_factor='3'

CQL Column Family Storage Parameters

CQL supports setting the following column family properties, which in a few cases have slightly different names than their corresponding column family attributes.

CQL Parameter Name Default Value
compaction_strategy_class SizeTieredCompactionStrategy
compaction_strategy_options none
compression_options none
comparator text
comment ''(an empty string)
default_validation text
gc_grace_seconds 864000
min_compaction_threshold 4
max_compaction_threshold 32
read_repair_chance 1.0
replicate_on_write false

compaction_strategy_class in CQL corresponds to the compaction_strategy attribute. default_validation in CQL corresponds to the default_validation_class attribute.

The CQL Shell Program

Using the CQL client, cqlsh, you can query the Cassandra database from the command line. All of the commands included in CQL are available on the CQLsh command line, plus the following commands: