Apache Cassandra 1.1 Documentation

Querying Cassandra

This document corresponds to an earlier product version. Make sure you are using the version that corresponds to your version.

Latest Cassandra documentation | Earlier Cassandra documentation

Cassandra CQL now has parity with the legacy API and command line interface (CLI) that has shipped with Cassandra for several years, making CQL the primary interface into the DBMS. CQL 3 is beta in Cassandra 1.1. The released version will be available in Cassandra 1.2.

CLI commands have complements in CQL based on the CQL specification 3. CQL specification 2 remains the default because CQL 2 applications are incompatible with the CQL specification 3. The compact storage directive used with the CREATE TABLE command provides backward compatibility with older Cassandra applications; new applications should avoid it.

You activate the CQL specification in one of these ways:

  • Start the CQL shell utility using the cql<specification number> startup option.

  • Use the set_sql_version Thrift method.

  • Specify the desired CQL version in the connect() call to the Python driver:

    connection = cql.connect('localhost:9160', cql_version='3.0')
    

CQL Specification 3 supports composite columns. By virtue of the composite primary keys feature in CQL, wide rows are supported with full denormalization.

CQL Specification 2 supports dynamic columns, but not composite columns.

Both versions of CQL support only a subset of all the available column family storage properties. Also, Super Columns are not supported by either CQL version; column_type and subcomparator arguments are not valid.

Using CQL

Developers can access CQL commands in a variety of ways. Drivers are available for Python, PHP, Ruby, Node.js, and JDBC-based client programs. For the purposes of administrators, using the Python-based CQLsh command-line client is the most direct way to run simple CQL commands. Using the CQLsh client, you can run CQL commands from the command line. The CQLsh client is installed with Cassandra in <install_location>/bin/cqlsh for tarball installations, or /usr/bin/cqlsh for packaged installations.

When you start CQLsh, you can provide the IP address of a Cassandra node to connect to. The default is localhost. You can also provide the RPC connection port (default is 9160), and the cql specification number.

Starting CQLsh Using the CQL 2 Specification

Linux

To start CQLsh 2.0.0 using the default CQL 2 Specification from the Cassandra bin directory on Linux, for example:

./cqlsh

Windows

To start CQLsh 2.0.0 using the default CQL 2 Specification from the Cassandra bin directory on Windows in the Command Prompt, for example:

python cqlsh

Starting CQLsh Using the CQL 3 Specification

Linux

To start CQLsh using the CQL 3 Specification on Linux, for example, navigate to the bin directory and enter this command:

./cqlsh -3

To start CQLsh on a different node, specify the IP address and port:

./cqlsh 1.2.3.4 9160 -3

Windows

To start CQLsh using the CQL 3 Specification on Windows, for example, in Command Prompt navigate to the bin directory and enter this command:

python cqlsh -3

To start CQLsh on a different node, specify the IP address and port:

python cqlsh 1.2.3.4 9160 -3

Windows and Linux

To exit CQLsh:

cqlsh> exit

Using CQL Commands

Commands in CQL combine SQL-like syntax that maps to Cassandra concepts and operations. CQL 3, like CQL 2, supports tab completion. Some platforms, such as Mac OSX, do not ship with tab completion installed. You can use easy_install to install tab completion capabilities on OSX:

easy_install readline

This section presents an overview of the CQL 3 commands. These commands are described in detail in the CQL Reference.

Note

The examples in this documentation use CQL 3 and, in some cases, do not work if you use the default CQL 2.

For a description of CQL 2, see the CQL reference for Cassandra 1.0.

Creating a Keyspace

If you haven't already started CQLsh 3, do so.

./cqlsh -3

To create a keyspace, the CQL counterpart to the SQL database, use the CREATE KEYSPACE command.

Note

This example works with pre-release CQL 3 in Cassandra 1.1.x. The syntax differs in the release version of CQL 3 in Cassandra 1.2 and later.

cqlsh> CREATE KEYSPACE demodb
         WITH strategy_class = 'SimpleStrategy'
         AND strategy_options:replication_factor='1';

The strategy_class keyspace option must be enclosed in single quotation marks. For more information about keyspace options, see About Replication in Cassandra.

Using the Keyspace

After creating a keyspace, select the keyspace for use, just as you connect to a database in SQL:

cqlsh> USE demodb;

Next, create a column family, the counterpart to a table in SQL, and then populate it with data.

Creating a Column Family

To create a users column family in the newly created keyspace:

cqlsh:demodb> CREATE TABLE users (
                user_name varchar,
                password varchar,
                gender varchar,
                session_token varchar,
                state varchar,
                birth_year bigint,
                PRIMARY KEY (user_name)
              );

The users column family has a single primary key.

After executing the CREATE TABLE command, if you get no output, all is well. If you get an error message, you probably started CQLsh 2 instead of CQLsh 3.

Inserting and Retrieving Columns

In production scenarios, inserting columns and column values programmatically is more practical than using CQLsh, but often, being able to test queries using this SQL-like shell is very convenient.

The following example shows how to use CQLsh to create and then get a user record for "jsmith." The record includes a value for the password column. The user name "jsmith" is the row key, or in CQL terms, the primary key.

cqlsh:demodb> INSERT INTO users
                (user_name, password)
                VALUES ('jsmith', 'ch@ngem3a');

cqlsh:demodb> SELECT * FROM users WHERE user_name='jsmith';

Using the Keyspace Qualifier

Sometimes it is inconvenient to have to issue a USE statement to select a keyspace. If you use connection pooling, for example, you have multiple keyspaces to juggle. Simplify tracking multiple keyspaces using the keyspace qualifier. Use the name of the keyspace followed by a period, then the column family name. For example, History.timeline.

cqlsh:demodb> INSERT INTO History.timeline (user_id, tweet_id, author, body)
                VALUES (gmason, 1765, phenry, 'Give me liberty or give me death');

You can specify the keyspace you want to use in these statements:

  • SELECT
  • CREATE
  • ALTER
  • DELETE
  • TRUNCATE

For more information, see the CQL Reference.

Using Composite Primary Keys

Use a composite primary key when you need wide row support or when you want to create columns that you can query to return sorted results. To create a column family having a composite primary key:

cqlsh:demodb> CREATE TABLE emp (
                empID int,
                deptID int,
                first_name varchar,
                last_name varchar,
                PRIMARY KEY (empID, deptID)
              );

The composite primary key is made up of the empID and deptID columns. The empID acts as a partition key for distributing data in the column family among the various nodes that comprise the cluster. The remaining column, the deptID, in the primary key acts as a clustering mechanism and ensures the data is stored in ascending order on disk (much like a clustered index in Microsoft SQL Server works). For more information about composite keys, see the Composite Columns section.

To insert data into the column family:

cqlsh:demodb> INSERT INTO emp (empID, deptID, first_name, last_name)
                VALUES (104, 15, 'jane', 'smith');

If you want to specify a keyspace other than the one you're using, prefix the keyspace name followed by a period (.) to the column family name:

cqlsh> INSERT INTO demodb.emp
         (empID, deptID, first_name, last_name)
         VALUES (130, 5, 'sughit', 'singh');

Retrieving and Sorting Results

Similar to a SQL query, follow the WHERE clause by an ORDER BY clause to retrieve and sort results:

cqlsh:demodb> SELECT * FROM emp WHERE empID IN (130,104) ORDER BY deptID DESC;

 empid | deptid | first_name | last_name
-------+--------+------------+-----------
   104 |     15 |       jane |     smith
   130 |      5 |     sughit |     singh

cqlsh:demodb> SELECT * FROM emp where empID IN (130,104) ORDER BY deptID ASC;

 empid | deptid | first_name | last_name

-------+--------+------------+-----------
   130 |      5 |     sughit |     singh
   104 |     15 |       jane |     smith

See the tweets example for more information about using composite primary keys.

Adding Columns with ALTER TABLE

The ALTER TABLE command adds new columns to a column family. For example, to add a coupon_code column with the varchar validation type to the users column family:

cqlsh:demodb> ALTER TABLE users ADD coupon_code varchar;

This creates the column metadata and adds the column to the column family schema, but does not update any existing rows.

Altering Column Metadata

Using ALTER TABLE, you can change the type of a column after it is defined or added to a column family. For example, to change the coupon_code column to store coupon codes as integers instead of text, change the validation type as follows:

cqlsh:demodb> ALTER TABLE users ALTER coupon_code TYPE int;

Only newly inserted values, not existing coupon codes are validated against the new type.

Specifying Column Expiration with TTL

Both the INSERT and UPDATE commands support setting a column expiration time (TTL). In the INSERT example above for the key jsmith we set the password column to expire at 86400 seconds, or one day. If we wanted to extend the expiration period to five days, we could use the UPDATE command a shown:

cqlsh:demodb> INSERT INTO users
                (user_name, password)
                VALUES ('cbrown', 'ch@ngem4a') USING TTL 86400;

cqlsh:demodb> UPDATE users USING TTL 432000 SET 'password' = 'ch@ngem4a'
                WHERE user_name = 'cbrown';

Dropping Column Metadata

To remove a column's metadata entirely, including the column name and validation type, use ALTER TABLE <columnFamily> DROP <column>. The following command removes the name and validator without affecting or deleting any existing data:

cqlsh:demodb> ALTER TABLE users DROP coupon_code;

After you run this command, clients can still add new columns named coupon_code to the users column family, but the columns are not validated until you explicitly add a type.

Indexing a Column

CQLsh can be used to create secondary indexes, or indexes on column values. This example creates an index on the state and birth_year columns in the users column family.

cqlsh:demodb> CREATE INDEX state_key ON users (state);
cqlsh:demodb> CREATE INDEX birth_year_key ON users (birth_year);

Because of the secondary index created for the two columns, their values can be queried directly as follows:

cqlsh:demodb> SELECT * FROM users
                WHERE gender='f' AND
                state='TX' AND
                birth_year='1968';

Deleting Columns and Rows

CQLsh provides the DELETE command to delete a column or row. This example deletes user jsmith's session token column, and then delete jsmith's entire row.

cqlsh:demodb> DELETE session_token FROM users where pk = 'jsmith';
cqlsh:demodb> DELETE FROM users where pk = 'jsmith';

Dropping Column Families and Keyspaces

Using CQLsh commands, you can drop column families and keyspaces in much the same way that tables and databases are dropped in relational models. This example shows the commands to drop our example users column family and then drop the demodb keyspace:

cqlsh:demodb> DROP TABLE users;
cqlsh:demodb> DROP KEYSPACE demodb;

Paging through Non-ordered Partitioner Results

When using the RandomPartitioner, Cassandra rows are ordered by the MD5 hash of their value and hence the order of rows is not meaningful. Despite that fact, given the following definition:

CREATE TABLE test (
  k int PRIMARY KEY,
  v1 int,
  v2 int
);

and assuming RandomPartitioner, CQL2 allowed queries like:

SELECT * FROM test WHERE k > 42;

The semantics of such a query was to query all rows for which the MD5 of the key was bigger than the MD5 of 42. The query would return results where k ≤ 42, which is unintuitive. CQL3 forbids such a query unless the partitioner in use is ordered.

Even when using the random partitioner, it can sometimes be useful to page through all rows. For this purpose, CQL3 includes the token function:

SELECT * FROM test WHERE token(k) > token(42);

The ByteOrdered partitioner arranges tokens the same way as key values, but the RandomPartitioner distributes tokens in a completely unordered manner. Using the token function actually queries results directly using tokens. Underneath, the token function makes token-based comparisons and does not convert keys to tokens (not k > 42).

Querying System Tables

An alternative to the Thrift API describe_keyspaces function is querying the system tables directly in CQL 3. For example, you can query the defined keyspaces as follows:

SELECT * from system.schema_keyspaces;

The output includes information about defined keyspaces. For example:

 keyspace | durable_writes | name    | strategy_class | strategy_options
----------+----------------+---------+----------------+----------------------------
  history |           True | history | SimpleStrategy | {"replication_factor":"1"}
  ks_info |           True | ks_info | SimpleStrategy | {"replication_factor":"1"}

You can also retreive information about tables by querying system.schema_columnfamilies and about column metadata by querying system.schema_columns.