DataStax Developer Blog

CQL improvements in Cassandra 2.1

By Sylvain Lebresne -  February 28, 2014 | 8 Comments

A number of improvements/new features have been added to CQL in the upcoming Cassandra 2.1 release, and this post offers to walk through those improvements.

User Defined Types

The first addition to CQL coming in Cassandra 2.1 is the ability to create User Defined Types (or UDT for short). Let’s introduce those through an example: suppose you store user profiles (with name, email, …) and need, for each user, to store a number of addresses. In Cassandra 2.1, with UDTs, you will be able to do this the following way:

  CREATE TYPE address (
      street text,
      city text,
      zip int
  );
  
  CREATE TABLE user_profiles (
      login text PRIMARY KEY,
      first_name text,
      last_name text,
      email text,
      addresses map<text, address>
  );
  
  // Inserts a user with a home address
  INSERT INTO user_profiles(login, first_name, last_name, email, addresses)
  VALUES ('tsmith',
          'Tom',
          'Smith',
          'tsmith@gmail.com',
          { 'home': { street: '1021 West 4th St. #202',
                      city: 'San Fransisco',
                      zip: 94110 }});

  // Adds a work address for our user
  UPDATE user_profiles
     SET addresses = addresses
                   + { 'work': { street: '3975 Freedom Circle Blvd',
                                 city: 'Santa Clara',
                                 zip: 95050 }}
   WHERE login = 'tsmith';

There is a relative flexibility on what the types of the fields of a UDT can be: collections are allowed as well as other UDTs. For instance, suppose that each address should store a few associated phone numbers, and that we want to allow a couple of tags with each phone number (the latter is arguably a bit far-fetched, this is just meant to demonstrate the feature). You can alter the address type above in the following way:

  CREATE TYPE phone (
      number text,
      tags set<text>
  );
  
  // Add a 'phones' field to address that is a set of the 'phone' UDT above
  ALTER TYPE address ADD phones set<phone>;

In other words, UDTs can be nested relatively arbitrarily. However, UDTs have a main limitation in their current implementation, a limitation that one should keep in mind when using them: you cannot update only parts of a UDT value, you have to overwrite the whole thing every time. Put another way, a UDT value, no matter how complex it is, acts much like a blob would. In fact, it wouldn’t be incorrect to describe the current implementation of UDTs as “syntactic sugar for blobs”. Note that this limitation is not intrinsic to UDTs per-se, it’s a limitation of the current implementation and it is possible that it will be somewhat loosened or even entirely lifted in future iterations. But for the time being, one should keep it in mind when using user defined types.

More concretely, we now can update the 'work' address above with

  UPDATE user_profiles
     SET addresses['work'] = {
             street: '3975 Freedom Circle Blvd',
             city: 'Santa Clara',
             zip: 95050,
             phones : {
               {number: '212 221 9165', tags: { 'preferred', 'direct line' }},
               {number: '500 310 2342', tags: { 'fax' }}
             }
         }
   WHERE login = 'tsmith';

Some last remarks on user defined types:

  • A UDT value is always written in its entirety but it is also always read entirely under the hood (as of the current implementation at least).
    Note that it is possible, syntax wise, to only select some fields of a UDT, for instance

      CREATE TABLE location (
          id int PRIMARY KEY,
          addr address, // Reuse the address type from above
          longitude double,
          latitude double
      );
      
      // The following is allowed
      SELECT addr.street, addr.city FROM location WHERE id=42;
        

    but such selection will still read the full address (in fact the full CQL
    row) server side. Long story short, UDTs are not meant to store large and complex "documents" as of their current implementation, but
    rather to help make the denormalization of short amount of data more convenient and flexible.

  • While you can add fields to an existing UDT as we've seen above, it is currently not possible to remove a field. It is however possible to rename a field (ALTER TYPE address RENAME street to str33t).
  • It is possible to use a UDT as type of any CQL column, including clustering ones. In that latter case, the ordering induced by the UDT is the one of it's fields in the order they have been declared. Please note however that there is relatively little advantages to be gain in using a UDT on a PRIMARY KEY column, avoid abusing such possibility just because it's available.

Secondary indexes on collections

So far, creating a secondary index on a collection column was not allowed. Cassandra 2.1 fixes that. For instance:

  CREATE TABLE products (
      id int PRIMARY KEY,
      description text,
      price int,
      categories set<text>,
      features map<text, text>
  );
  
  // This is now allowed in Cassandra 2.1
  CREATE INDEX cat_index ON products(categories);
  CREATE INDEX feat_index ON products(features);
  
  INSERT INTO products(id, description, price, categories, features)
       VALUES (34134,
               '120-inch 1080p 3D plasma TV',
               9999,
               {'tv', '3D', 'hdtv'},
               {'screen' : '120-inch', 'refresh-rate' : '400hz', 'techno' : 'plasma'});
  
  INSERT INTO products(id, description, price, categories, features)
       VALUES (29412,
               '32-inch LED HDTV (black)',
               929,
               {'tv', 'hdtv'},
               {'screen' : '32-inch', 'techno' : 'LED'});
  
  INSERT INTO products(id, description, price, categories, features)
       VALUES (38471,
               '32-inch LCD TV',
               110,
               {'tv', 'used'},
               {'screen' : '32-inch', 'techno' : 'LCD'});
  
  // You can then query those index through CONTAINS
  SELECT id, description FROM products WHERE categories CONTAINS 'hdtv';
  
   id    | description
  -------+-----------------------------
   29412 |    32-inch LED HDTV (black)
   34134 | 120-inch 1080p 3D plasma TV
  
  SELECT id, description FROM products WHERE features CONTAINS '32-inch';
  
   id    | description
  -------+--------------------------
   29412 | 32-inch LED HDTV (black)
   38471 |           32-inch LCD TV

Please note that for map columns, the index created above index the map values. If you prefer indexing the map keys, you can do so by creating a KEYS index and by using CONTAINS KEY:

  DROP INDEX feat_index;
  CREATE INDEX feat_key_index ON products(KEYS(features));
  
  SELECT id, description
    FROM products
    WHERE features CONTAINS KEY 'refresh-rate';
  
   id    | description
  -------+-----------------------------
   34134 | 120-inch 1080p 3D plasma TV
  

Please note however that a current limitation (to be lifted in the future) is that you cannot index both the keys and values of the same map simultaneously (hence the DROP INDEX call above).



Comments

  1. DuyHai DOAN says:

    About using UDT (user defined type) within set and maps, how do you ensure equality for unicity ?

    By design, a set should respect its element unicity. With primitive types equality relation is quite straightforward.

    How is it handled with UDT ? Do you compare UDT instances field by field ?

  2. Sylvain Lebresne says:

    Yes, UDTs are compared field by field, in the order the fields have been defined.

  3. Jay says:

    Sweet. Thanks Sylvain.

    Curious, if I remembered, the whole collection (map/set/) was pulled out in Cassandra JVM even if only one value is requested. I assume now, Cassandra server may not need to fetch the whole collection in memory because of the indexes. Can you confirm? Great if you can point to design/implementation detail of this new indexes.

    Thanks
    Jay

  4. Robert Stupp says:

    Hmmm – just tried UDTs. Very nice feature.

    But isn’t there a way to add partial instances?
    Means: Not to provide all values of a UDT?

    The following insert does not work – error message is:

    code=2200 [Invalid query] message="Invalid user type literal for ...: missing field street


    INSERT INTO user_profiles(login, first_name, last_name, email, addresses)
    VALUES ('tsmith',
    'Tom',
    'Smith',
    'tsmith@gmail.com',
    { 'home': {
    city: 'San Fransisco' }});

  5. Jerry Lin says:

    This is awesome! I’ve been digging for exactly this in while data modeling recently. Not having the simple ability to do something like tuples, has been killing me.

    Any estimate on when Cassandra 2.1 will be released and then included as a Datastax Enterprise release?

  6. Asaf says:

    Is there a way to do a case insensitive search on a collections ?

  7. Asaf says:

    Is there a way to get only the rows that match more then one criteria ? for example:

    CREATE TABLE users (
    email text,
    tags list,

    PRIMARY KEY (email)
    );

    CREATE INDEX ON users(tags);
    // Fill some data
    UPDATE users SET email=’my@gmail.com’, tags = {‘A’, ‘B’, ‘C’, ‘D’}

    I want to fetch only the users that have A & B, is that possible ?
    What about fetching the users that have A or B, is that possible ?

    I would expect something like
    select email from users where skills contains {‘A’, ‘B’};
    to work, but it’s not.

  8. Cristian says:

    we’d really need it but I can’t release a sw based on a not-ready-for-production cassandra version.
    Any estimate on 2.1. release date?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>