Apache Cassandra 0.7 Documentation

Using the Cassandra CLI

The directory CASSANDRA_HOME/bin contains a startup script for launching the CLI. Running CASSANDRA_HOME/bin/cassandra-cli will display a usage list of the valid arguments and options.

To start the CLI and connect to a particular Cassandra instance, launch the script together with -host and -port arguments. In this example, we connect to the default instance “Test Cluster:”

$ ./cassandra-cli -host localhost -port 9160
Connected to: "Test Cluster" on localhost/9160
Welcome to cassandra CLI.
Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.
[default@unknown]

As the screen output suggests, you can enter a question mark, or help; for more information about commands. For detailed help on a specific command, use help <command>;.

Note

For every command or statement you enter into the CLI, make sure you enter a semicolon at the end before hitting the return key. If you forget to do this, the CLI echos an ellipsis ( . . . ), which indicates that the CLI expects more input – such as a semicolon, or names and values in other cases.

Creating a Keyspace

You can use the Cassandra CLI commands described in this section to create a keyspace. In creating an example keyspace for Twissandra, we will assume a desired replication factor of 1 and implementation of the SimpleStrategy replica placement strategy. For more information on these keyspace options, see Clustering in the DataStax reference documentation.

Note the single quotes around the string value of placement_strategy:

[default@unknown] create keyspace twissandra with replication_factor=1
and placement_strategy='org.apache.cassandra.locator.SimpleStrategy';

You can verify the creation of a keyspace with the show keyspaces command. The new keyspace is listed along with the system keyspace and any other existing keyspaces.

Creating a Column Family

For this example, we use the CLI to create a users column family in the example Twissandra keyspace. Column metadata is defined for the name of the password column and for its validation class to ensure that UTF8Type is used.

Note the use command to connect to the twissandra keyspace.

[default@unknown] use twissandra;
Authenticated to keyspace: twissandra
[default@twissandra] create column family users with comparator = UTF8Type
...     and column_metadata = [{column_name: password, validation_class:
...     UTF8Type}];
ade3bc44-236f-11e0-8410-56547f39a44b

Similar commands to create the columns families for Twissandra tweets, followers, userline and timeline would look like the following:

[default@twissandra] create column family tweets with comparator = UTF8Type and
column_metadata = [{column_name: body, validation_class:
UTF8Type}, {column_name: username, validation_class: UTF8Type}];
ba95d891-2cb5-11e0-9c0d-e700f669bcfc

[default@twissandra] create column family friends with comparator = UTF8Type;
71f22752-2cb6-11e0-9c0d-e700f669bcfc

[default@twissandra] create column family followers with comparator = UTF8Type;
81067983-2cb6-11e0-9c0d-e700f669bcfc

[default@twissandra] create column family userline with comparator = LongType and
default_validation_class = TimeUUIDType;
276b8544-2cb7-11e0-9c0d-e700f669bcfc

[default@twissandra] create column family timeline with comparator = LongType and
default_validation_class = TimeUUIDType;
80119cc5-2cb7-11e0-9c0d-e700f669bcfc

Inserting and Retrieving Columns

Though in production scenarios it is more practical to insert columns and column values programatically, it is possible to use the Cassandra CLI for these operations. The example in this section illustrates using the set and get commands to insert and retrieve some columns in the users column family.

The following commands create and then get a user record for “jsmith.” The record includes a value for the password column we created when we created the column family. Note that the user name “jsmith” is the row key – not a column.

[default@twissandra] set users['jsmith']['password']='ch@ngem3';
Value inserted.
[default@twissandra] get users['jsmith'];
=> (column=password, value=6368406e67656d33, timestamp=1295635612024000)

Note

For all CLI write and read operations such as these example commands, the consistency level is ONE. Different consistency levels are not available with the CLI, though all levels are available when writing/reading programatically.

Using Human-Readable Data

Applications on Cassandra may require values in bytes type or other formats that are not naturally human-readable. The CLI allows you to temporarily translate machine-readable data to human readable formats in order to work with data more easily.

To retrieve a more human-readable value for a particular value such as the UTF8Type password value used above, add as ascii to the get command:

[default@twissandra] get users['jsmith']['password'] as ascii;
=> (column=password, value=ch@ngem3, timestamp=1295635612024000)

Or, for more comprehensive translation of values, you can use the assume command. This command lets you select any one of the following attributes to view as a specified type:

  • keys (row key)
  • validator
  • comparator
  • sub_comparator (for sub-columns in a column family of type super)

For example, we could issue a command that assumes integer type for the row keys in the users column family. This would render human-readable values like ‘jmith’ and ‘jbellis’ as numbers; then, if we further assumed the comparator for users as integer, we could take advantage of that sort order to perform range queries (given the appropriate partitioner, and other conditions). The following example displays the users rows keys as integers:

[default@twissandra] assume users keys as integer;
Assumption for column family 'users' added successfully.
[default@twissandra] list users limit 2;
-------------------
RowKey: 118070570874734
=> (column=8097880544751088228, value=6368406e67656d33, timestamp=129710078100)
-------------------
RowKey: 29944535281592691
=> (column=8097880544751088228, value=6368406e67656d33, timestamp=129624362200)
2 Rows Returned.

You can use both assume and as <type> with any one of these valid type values:

  • bytes
  • integer
  • long
  • lexicaluuid
  • timeuuid
  • utf8
  • ascii

Setting an Expiring Column

When you set a column in Cassandra, you can optionally set an expiration time, or “time-to-live” (ttl) attribute for it. In this example we will imagine that the user jsmith gets assigned a web session token that lasts for ten days before he needs to log in again. To accomplish this, the column session_token is set with a ttl value of 864000:

[default@twissandra] set users['jsmith']['session_token'] = 'ten' with ttl=864000;
Value inserted.
[default@twissandra] get users['jsmith'] as ascii;
=> (column=password, value=ch@ngem3, timestamp=1295635612024000)
=> (column=session_token, value=74656e, timestamp=1295898172256000, ttl=864000)
Returned 2 results.

After ten days, or 864,000 seconds have elapsed since the setting of this column, its value will no longer be returned by read operations. Note, however, that the value is not actually deleted from disk until normal Cassandra tombstoning and compaction processes are completed.

Indexing a Column

The CLI can be used to create secondary indexes, or indexes on column values. In this example, we will update thecolumn family users with the new columns state and birth_date – the latter of which will be indexed. Note the the index_type specification at the end of the last line of this example command:

[default@twissandra] update column family users with comparator = UTF8Type
...  and column_metadata = [{column_name: password, validation_class:UTF8Type}
... {column_name: state, validation_class: UTF8Type},
... {column_name: birth_date, validation_class: LongType, index_type: KEYS}];

Because of the secondary index created for the column birth_date, its values can be queried directly for users born in a given year as follows:

[default@twissandra] get users where birth_date = 1973;
-------------------
RowKey: jsmith
=> (column=birth_date, value=1973, timestamp=1296677866680000)
=> (column=password, value=ch@ngem3, timestamp=1296243370505000)
=> (column=session_token, value=74656e, timestamp=1295898172256000, ttl=864000)
=> (column=state, value=UT, timestamp=1296677364573000)

Using the CLI you can also create a secondary index on an existing column. For example, we could update the users column family to add index_type: KEYS for the state column:

[default@twissandra] update column family users with comparator = UTF8Type
...  and column_metadata = [{column_name: password, validation_class:UTF8Type}
... {column_name: state, validation_class: UTF8Type, index_type: KEYS}];},
... {column_name: birth_date, validation_class: LongType, index_type: KEYS}];

Because of the secondary index created for the state column, its values can be queried directly for users in a given state. Additionally, Cassandra could perform a range query on birth_date now that the state column is indexed, using the state predicate as the primary and filtering on the other with a nested loop:

[default@demo] get users where state = 'TX' and birth_date > 1970;
RowKey: jbellis
=> (column=birth_date, value=1975, timestamp=1291333936242000)
=> (column=password, value=ch@ngem3, timestamp=1296243370505000)
=> (column=state, value=TX, timestamp=1291334909266000)

Retrieving Multiple Rows and Columns

Retrieving multiple rows and performing updates on column values is among the operations where high-level client APIs give you much more powerful functionality than the CLI. However, you can use the CLI to retrieve all columns in a row, specific columns from a row, or a list of all rows in a column family.

To get all columns for a row key, as we have demonstrated in some of the above examples:

[default@twissandra] get users['jsmith'];
=> (column=password, value=6368406e67656d33, timestamp=1295635612024000)
=> (column=session_token, value=74656e, timestamp=1295898172256000, ttl=864000)

To get a specific column or columns for a row, you can specify the column in the get command. Note the use of as to retrieve a human-readable value:

[default@twissandra] get users['jsmith']['session_token'] as ascii;
=> (column=session_token, value=ten, timestamp=1295898172256000, ttl=864000)

You can retrieve all rows in a column family using the list command, optionally controlling the number of records retrieved by specifing a limit value as shown:

[default@twissandra] list users limit 5;
-------------------
RowKey: kbrown
=> (column=password, value=ch@ngem3, timestamp=1296243429992000)
-------------------
RowKey: jbellis
=> (column=birth_date, value=1975, timestamp=1291333936242000)
=> (column=password, value=ch@ngem3, timestamp=1296243370505000)
=> (column=state, value=TX, timestamp=1291334909266000)
-------------------
RowKey: jsmith
=> (column=birth_date, value=1973, timestamp=1296677866680000)
=> (column=password, value=ch@ngem3, timestamp=1296243370505000)
=> (column=session_token, value=74656e, timestamp=1295898172256000, ttl=864000)
=> (column=state, value=UT, timestamp=1296677364573000)
-------------------
RowKey: jbrown
=> (column=password, value=ch@ngem3, timestamp=1296243409963000)
=> (column=session_token, value=ten, timestamp=1296505684508000, ttl=864000)
-------------------
RowKey: msmith
=> (column=password, value=ch@ngem3, timestamp=1296243420785000)

5 Rows Returned.

Deleting Rows

The Cassandra CLI provides the del command to delete a row, column or subcolumn. In this example we will delete user jbrown’s session token column, and then delete jbrown’s row entirely.

[default@twissandra] del users['jbrown']['session_token'];
column removed.
[default@twissandra] get users ['jbrown'];
=> (column=password, value=6368406e67656d33, timestamp=1296243409963000)
Returned 1 results.
[default@twissandra] del users ['jbrown'];
row removed.
[default@twissandra] get users ['jbrown'];
Returned 0 results.

Note, however, that the phenomena called “range ghosts” in Cassandra may mean that deleted rows are still retrieved by list commands or other get operations. Deleted values, including range ghosts, are removed completely by the first compaction following deletion.

Dropping Column Families and Keyspaces

With Cassandra CLI commands you can drop column families and keyspaces in much the same way that tables and databases are dropped in relational models. This example shows the commands to drop our example users column family and then drop the twissandra keyspace altogether:

[default@twissandra] drop column family users;
ade3bc44-236f-11e0-8410-56547f39a44b
[default@twissandra] drop keyspace twissandra;
30448a50-28d8-11e0-9c0d-e700f669bcfc