CQL for Cassandra 2.x

Using collections

Cassandra includes collection types that provide an improved way of handling tasks, such as building multiple email address capability into tables. Observe the following limitations of collections:

  • The maximum size of an item in a collection is 64K.
  • Keep collections small to prevent delays during querying because Cassandra reads a collection in its entirety. The collection is not paged internally.

    As discussed earlier, collections are designed to store only a small amount of data.

  • Never insert more than 64K items in a collection.

    If you insert more than 64K items into a collection, only 64K of them will be queryable, resulting in data loss.

You can expire each element of a collection by setting an individual time-to-live (TTL) property.

Using the set type

A set stores a group of elements that are returned in sorted order when queried. A column of type set consists of unordered unique values. Using the set data type, you can solve the multiple email problem in an intuitive way that does not require a read before adding a new email address.

Procedure

  1. Define a set, emails, in the users table to accommodate multiple email address.
    CREATE TABLE users (
      user_id text PRIMARY KEY,
      first_name text,
      last_name text,
      emails set<text>
    );
  2. Insert data into the set, enclosing values in curly brackets.
    Set values must be unique.
    INSERT INTO users (user_id, first_name, last_name, emails)
      VALUES('frodo', 'Frodo', 'Baggins', {'f@baggins.com', 'baggins@gmail.com'});
  3. Add an element to a set using the UPDATE command and the addition (+) operator.
    UPDATE users
      SET emails = emails + {'fb@friendsofmordor.org'} WHERE user_id = 'frodo';
  4. Retrieve email addresses for frodo from the set.
    SELECT user_id, emails FROM users WHERE user_id = 'frodo';
    When you query a table containing a collection, Cassandra retrieves the collection in its entirety; consequently, keep collections small enough to be manageable, or construct a data model to replace collections that can accommodate large amounts of data.

    Cassandra returns results in an order based on the type of the elements in the collection. For example, a set of text elements is returned in alphabetical order. If you want elements of the collection returned in insertion order, use a list.

     user_id | emails
    ---------+-------------------------------------------------------------------
     frodo   | {"baggins@caramail.com","f@baggins.com","fb@friendsofmordor.org"}
    
  5. Remove an element from a set using the subtraction (-) operator.
    UPDATE users
      SET emails = emails - {'fb@friendsofmordor.org'} WHERE user_id = 'frodo';
  6. Remove all elements from a set by using the UPDATE or DELETE statement.
    A set, list, or map needs to have at least one element; otherwise, Cassandra cannot distinguish the set from a null value.
    UPDATE users SET emails = {} WHERE user_id = 'frodo';
    
    DELETE emails FROM users WHERE user_id = 'frodo';
    A query for the emails returns null.
    SELECT user_id, emails FROM users WHERE user_id = 'frodo';
     user_id | emails
    ---------+---------
     frodo   | null

Using the list type

When the order of elements matters, which may not be the natural order dictated by the type of the elements, use a list. Also, use a list when you need to store same value multiple times. List values are returned according to their index value in the list, whereas set values are returned in alphabetical order, assuming the values are text.

Using the list type you can add a list of preferred places for each user in a users table, and then query the database for the top x places for a user.

Procedure

  1. Add a list declaration to a table by adding a column top_places of the list type to the users table.
    ALTER TABLE users ADD top_places list<text>;
  2. Use the UPDATE command to insert values into the list.
    UPDATE users
      SET top_places = [ 'rivendell', 'rohan' ] WHERE user_id = 'frodo';
  3. Prepend an element to the list by enclosing it in square brackets, and using the addition (+) operator.
    UPDATE users
      SET top_places = [ 'the shire' ] + top_places WHERE user_id = 'frodo';
  4. Append an element to the list by switching the order of the new element data and the list name in the UPDATE command.
    UPDATE users
      SET top_places = top_places + [ 'mordor' ] WHERE user_id = 'frodo';
    
    These update operations are implemented internally without any read-before-write. Appending and prepending a new element to the list writes only the new element.
  5. Add an element at a particular position using the list index position in square brackets
    UPDATE users SET top_places[2] = 'riddermark' WHERE user_id = 'frodo';
    When you add an element at a particular position, Cassandra reads the entire list, and then writes only the updated element. Consequently, adding an element at a particular position results in greater latency than appending or prefixing an element to a list.
  6. Remove an element from a list using the DELETE command and the list index position in square brackets. For example, remove modor, leaving the shire, rivendell, and riddermark.
    DELETE top_places[3] FROM users WHERE user_id = 'frodo';
  7. Remove all elements having a particular value using the UPDATE command, the subtraction operator (-), and the list value in square brackets. For example, remove riddermark.
    UPDATE users
      SET top_places = top_places - ['riddermark'] WHERE user_id = 'frodo';
    The former, indexed method of removing elements from a list requires a read internally. Using the UPDATE command as shown here is recommended over emulating the operation client-side by reading the whole list, finding the indexes that contain the value to remove, and then removing those indexes. This emulation would not be thread-safe. If another thread/client prefixes elements to the list between the read and the write, the wrong elements are removed. Using the UPDATE command as shown here does not suffer from that problem.
  8. Query the database for a list of top places.
    SELECT user_id, top_places FROM users WHERE user_id = 'frodo';
    Results show:
     user_id | top_places
    ---------+----------------------------
       frodo | ['the shire', 'rivendell']
                    

Using the map type

As its name implies, a map maps one thing to another. A map is a name and a pair of typed values. Using the map type, you can store timestamp-related information in user profiles. Each element of the map is internally stored as one Cassandra column that you can modify, replace, delete, and query. Each element can have an individual time-to-live and expire when the TTL ends.

Procedure

  1. Add a todo list to every user profile in an existing users table using the CREATE TABLE or ALTER statement, specifying the map collection and enclosing the pair of data types in angle brackets.
    ALTER TABLE users ADD todo map<timestamp, text>;
  2. Set or replace map data, using the INSERT or UPDATE command, and enclosing the timestamp and text values in a map collection: curly brackets, separated by a colon.
    UPDATE users
      SET todo =
      { '2012-9-24' : 'enter mordor',
      '2014-10-2 12:00' : 'throw ring into mount doom' }
      WHERE user_id = 'frodo';
  3. Set a specific element using the UPDATE command, enclosing the timestamp of the element in square brackets, and using the equals operator to map the value to that timestamp.
    UPDATE users SET todo['2014-10-2 12:00'] = 'throw my precious into mount doom'
      WHERE user_id = 'frodo';
  4. Use INSERT to specify data in a map collection.
    INSERT INTO users (user_id, todo) VALUES ('frodo', { '2013-9-22 12:01' : 'birthday wishes to Bilbo', '2013-10-1 18:00': 'Check into Inn of Pracing Pony'}) ;
    In Cassandra 2.1.1 and later, you can add map elements using this syntax:
    UPDATE users SET todo = todo + { '2013-9-22 12:01' : 'birthday wishes to Bilbo', '2013-10-1 18:00': 'Check into Inn of Pracing Pony'} WHERE user_id='frodo';
    Inserting this data into the map replaces the entire map.
  5. Delete an element from the map using the DELETE command and enclosing the timestamp of the element in square brackets:
    DELETE todo['2013-9-22 12:01'] FROM users WHERE user_id = 'frodo';
    In Cassandra 2.1.1 and later, you can delete multiple map elements using this syntax:
    UPDATE users SET todo=todo - {'2013-9-22 12:01','2013-10-01 18:00:00-0700'} WHERE user_id='frodo';
  6. Retrieve the todo map.
    SELECT user_id, todo FROM users WHERE user_id = 'frodo';
    The order of the map output depends on the type of the map.
  7. Compute the TTL to use to expire todo list elements on the day of the timestamp, and set the elements to expire.
    UPDATE users USING TTL <computed_ttl>
      SET todo['2012-10-1'] = 'find water' WHERE user_id = 'frodo';
Show/hide