CQL improvements in Cassandra 2.1

By Sylvain Lebresne -  February 28, 2014 | 21 Comments

A number of improvements/new features have been added to CQL in the upcoming Cassandra 2.1 release, and this post offers to walk through those improvements.

User Defined Types

The first addition to CQL coming in Cassandra 2.1 is the ability to create User Defined Types (or UDT for short). Let's introduce those through an example: suppose you store user profiles (with name, email, ...) and need, for each user, to store a number of addresses. In Cassandra 2.1, with UDTs, you will be able to do this the following way:

  CREATE TYPE address (
      street text,
      city text,
      zip int
  CREATE TABLE user_profiles (
      login text PRIMARY KEY,
      first_name text,
      last_name text,
      email text,
      addresses map<text, frozen<address>>
  // Inserts a user with a home address
  INSERT INTO user_profiles(login, first_name, last_name, email, addresses)
  VALUES ('tsmith',
          { 'home': { street: '1021 West 4th St. #202',
                      city: 'San Fransisco',
                      zip: 94110 }});

  // Adds a work address for our user
  UPDATE user_profiles
     SET addresses = addresses
                   + { 'work': { street: '3975 Freedom Circle Blvd',
                                 city: 'Santa Clara',
                                 zip: 95050 }}
   WHERE login = 'tsmith';

There is a relative flexibility on what the types of the fields of a UDT can be: collections are allowed as well as other UDTs. For instance, suppose that each address should store a few associated phone numbers, and that we want to allow a couple of tags with each phone number (the latter is arguably a bit far-fetched, this is just meant to demonstrate the feature). You can alter the address type above in the following way:

  CREATE TYPE phone (
      number text,
      tags set<text>
  // Add a 'phones' field to address that is a set of the 'phone' UDT above
  ALTER TYPE address ADD phones set<frozen<phone>>;

In other words, UDTs can be nested relatively arbitrarily. However, UDTs have a main limitation in their current implementation, a limitation that one should keep in mind when using them: you cannot update only parts of a UDT value, you have to overwrite the whole thing every time. Put another way, a UDT value, no matter how complex it is, acts much like a blob would. In fact, it wouldn't be incorrect to describe the current implementation of UDTs as "syntactic sugar for blobs". Note that this limitation is not intrinsic to UDTs per-se, it's a limitation of the current implementation and it is possible that it will be somewhat loosened or even entirely lifted in future iterations. But for the time being, one should keep it in mind when using user defined types.

More concretely, we now can update the 'work' address above with

  UPDATE user_profiles
     SET addresses['work'] = {
             street: '3975 Freedom Circle Blvd',
             city: 'Santa Clara',
             zip: 95050,
             phones : {
               {number: '212 221 9165', tags: { 'preferred', 'direct line' }},
               {number: '500 310 2342', tags: { 'fax' }}
   WHERE login = 'tsmith';

And at this point, the example would yield:

  SELECT * FROM user_profiles;
  login  | addresses                                                                                                                                                                                                                                                                       | email            | first_name | last_name
  tsmith | {'home': {street: '1021 West 4th St. #202', city: 'San Fransisco', zip: 94110}, 'work': {street: '3975 Freedom Circle Blvd', city: 'Santa Clara', zip: 95050, phones: {{number: '212 221 9165', tags: {'direct line', 'preferred'}}, {number: '500 310 2342', tags: {'fax'}}}}} | tsmith@gmail.com |        Tom |     Smith


Some last remarks on user defined types:

  • A UDT value is always written in its entirety but it is also always read entirely under the hood (as of the current implementation at least).
    Note that it is possible, syntax wise, to only select some fields of a UDT, for instance

      CREATE TABLE location (
          id int PRIMARY KEY,
          addr frozen<address>, // Reuse the address type from above
          longitude double,
          latitude double
      // The following is allowed
      SELECT addr.street, addr.city FROM location WHERE id=42;

    but such selection will still read the full address (in fact the full CQL
    row) server side. Long story short, UDTs are not meant to store large and complex "documents" as of their current implementation, but
    rather to help make the denormalization of short amount of data more convenient and flexible.

  • While you can add fields to an existing UDT as we've seen above, it is currently not possible to remove a field. It is however possible to rename a field (ALTER TYPE address RENAME street to str33t).
  • It is possible to use a UDT as type of any CQL column, including clustering ones. In that latter case, the ordering induced by the UDT is the one of it's fields in the order they have been declared. Please note however that there is relatively little advantages to be gain in using a UDT on a PRIMARY KEY column, avoid abusing such possibility just because it's available.

Secondary indexes on collections

So far, creating a secondary index on a collection column was not allowed. Cassandra 2.1 fixes that. For instance:

  CREATE TABLE products (
      id int PRIMARY KEY,
      description text,
      price int,
      categories set<text>,
      features map<text, text>
  // This is now allowed in Cassandra 2.1
  CREATE INDEX cat_index ON products(categories);
  CREATE INDEX feat_index ON products(features);
  INSERT INTO products(id, description, price, categories, features)
       VALUES (34134,
               '120-inch 1080p 3D plasma TV',
               {'tv', '3D', 'hdtv'},
               {'screen' : '120-inch', 'refresh-rate' : '400hz', 'techno' : 'plasma'});
  INSERT INTO products(id, description, price, categories, features)
       VALUES (29412,
               '32-inch LED HDTV (black)',
               {'tv', 'hdtv'},
               {'screen' : '32-inch', 'techno' : 'LED'});
  INSERT INTO products(id, description, price, categories, features)
       VALUES (38471,
               '32-inch LCD TV',
               {'tv', 'used'},
               {'screen' : '32-inch', 'techno' : 'LCD'});
  // You can then query those index through CONTAINS
  SELECT id, description FROM products WHERE categories CONTAINS 'hdtv';
   id    | description
   29412 |    32-inch LED HDTV (black)
   34134 | 120-inch 1080p 3D plasma TV
  SELECT id, description FROM products WHERE features CONTAINS '32-inch';
   id    | description
   29412 | 32-inch LED HDTV (black)
   38471 |           32-inch LCD TV

Please note that for map columns, the index created above index the map values. If you prefer indexing the map keys, you can do so by creating a KEYS index and by using CONTAINS KEY:

  DROP INDEX feat_index;
  CREATE INDEX feat_key_index ON products(KEYS(features));
  SELECT id, description
    FROM products
    WHERE features CONTAINS KEY 'refresh-rate';
   id    | description
   34134 | 120-inch 1080p 3D plasma TV

Please note however that a current limitation (to be lifted in the future) is that you cannot index both the keys and values of the same map simultaneously (hence the DROP INDEX call above).

DataStax has many ways for you to advance in your career and knowledge.

You can take free classes, get certified, or read one of our many white papers.

register for classes

get certified

DBA's Guide to NoSQL


  1. DuyHai DOAN says:

    About using UDT (user defined type) within set and maps, how do you ensure equality for unicity ?

    By design, a set should respect its element unicity. With primitive types equality relation is quite straightforward.

    How is it handled with UDT ? Do you compare UDT instances field by field ?

  2. Sylvain Lebresne says:

    Yes, UDTs are compared field by field, in the order the fields have been defined.

  3. Jay says:

    Sweet. Thanks Sylvain.

    Curious, if I remembered, the whole collection (map/set/) was pulled out in Cassandra JVM even if only one value is requested. I assume now, Cassandra server may not need to fetch the whole collection in memory because of the indexes. Can you confirm? Great if you can point to design/implementation detail of this new indexes.


  4. Robert Stupp says:

    Hmmm – just tried UDTs. Very nice feature.

    But isn’t there a way to add partial instances?
    Means: Not to provide all values of a UDT?

    The following insert does not work – error message is:

    code=2200 [Invalid query] message="Invalid user type literal for ...: missing field street

    INSERT INTO user_profiles(login, first_name, last_name, email, addresses)
    VALUES ('tsmith',
    { 'home': {
    city: 'San Fransisco' }});

  5. Jerry Lin says:

    This is awesome! I’ve been digging for exactly this in while data modeling recently. Not having the simple ability to do something like tuples, has been killing me.

    Any estimate on when Cassandra 2.1 will be released and then included as a Datastax Enterprise release?

  6. Asaf says:

    Is there a way to do a case insensitive search on a collections ?

  7. Asaf says:

    Is there a way to get only the rows that match more then one criteria ? for example:

    CREATE TABLE users (
    email text,
    tags list,

    PRIMARY KEY (email)

    CREATE INDEX ON users(tags);
    // Fill some data
    UPDATE users SET email=’my@gmail.com’, tags = {‘A’, ‘B’, ‘C’, ‘D’}

    I want to fetch only the users that have A & B, is that possible ?
    What about fetching the users that have A or B, is that possible ?

    I would expect something like
    select email from users where skills contains {‘A’, ‘B’};
    to work, but it’s not.

  8. Cristian says:

    we’d really need it but I can’t release a sw based on a not-ready-for-production cassandra version.
    Any estimate on 2.1. release date?

  9. James says:

    When will DevCenter be updated to allow 2.1 functionality? Right now we are using 2.1rc7 but get an error when trying to run a contains search on a collection. I assume that DevCenter will release a new parallel version when 2.1 goes to production. Is that accurate? Also, is there a beta version of DevCenter we can test early?

  10. Sachin Sethiya says:

    For filtering data in a collection, is it possible to return the row with the specific key-value pair from the map collection. So instead of returning the entire row, since I am already providing a filtering criteria, I only want that key-value pair from the map and not the entire map content.

  11. Nitin says:

    Can I add set as data type? I have a set of UDT in my bean. What will be the best way to represent it in my table structure. When I try to define it as blob directly it says, field should be a set. Wen I try to use set, it says field should be set

  12. Madheswaran says:

    I using List and its data type is UDT.

    CREATE TYPE fieldmap (
    key text,
    value text

    entity_id uuid PRIMARY KEY,
    begining int,
    domain text,
    domain_type text,
    entname text,
    field_values list,
    gen_type text,
    revision_time timeuuid,
    status_key int,
    status_name text,
    uuid timeuuid );

    CREATE INDEX entity_domain_idx_1 ON galaxy_dev.entity (domain);
    CREATE INDEX entity_field_values_idx_1 ON galaxy_dev.entity (field_values);
    CREATE INDEX entity_global_entity_type_idx_1 ON galaxy_dev.entity (gen_type );

    SELECT * FROM entity WHERE status_key < 3 and field_values contains {key: 'userName', value: 'Sprint5_200002'} and gen_type = 'USER' and domain = 'S4_1017.abc.com' allow filtering;

    The above query return value for some domain and not for many domains but those domains are exist.

    I have single node cassadra DB. Please let me know why this strange behavior from cassandra.

    1. Madheswaran says:

      CREATE TABLE entity (
      entity_id uuid PRIMARY KEY,
      begining int,
      domain text,
      domain_type text,
      entity_template_name text,
      field_values list,
      gen_type text,
      revision_time timeuuid,
      status_key int,
      status_name text,
      uuid timeuuid
      ) ;

  13. DavidHardwick says:

    Can you provide the reasons Cassandra users were asking for UDTs? Is there a typical use case?

    (note, looking at the comments, I recommend doing one page per new feature instead of two features on a page 😉

  14. Minh Vu says:

    Using a Map, can we integrate it with solr index, faceting?

  15. nguyener says:

    1. When creating table which contain a collection of UDT, does it require to add keyword “frozen” for UDT column? I followed your instruction to create a table with a SET (insteads of MAP) address, and got error. I searched around and found the solution (adding “frozen”)
    2. Can you show me the insert query for my table, which is defined as:

    CREATE TABLE user_profiles (
    login text PRIMARY KEY,
    first_name text,
    last_name text,
    email text,
    addresses set<frozen>

  16. Ben says:


    Is it possible to have nested UDT of it’s own type?

    For example UDT A has a field called innerA which is of UDT A too.


  17. Val says:

    1. What is the implication of creating an index on a collection?

    On Datastax’s documentation is says not to use an secondary index on high-cardinality columns because you then query a huge volume of records for a small number of results.

    2. Values in collection types are very likely to be unique. Wouldn’t this be a very high-cardinality column and thus a bad candidate for secondary index?

    3. Another question not directly related to this article:
    if you ALWAYS include the partition key in your query, would adding secondary indices to the query be efficient? That is, if we go back to my question 2: if we always include the partition key in our queries, then even if the cardinality of the secondary index is high, it is still very efficient, correct?

    4. Following question 3, it is efficient if we include multiple secondary indices (and allow filtering) in our queries if we ALWAYS specify the partition key?
    e.g.: select * from table_foo where partiton_key=’bar’ and sec_idx1=’foo’ and sec_idx2=’baz’ ALLOW FILTERING

  18. eric says:

    if i have a map of

    can u query the int using a GREATER or LESSTHAN operator. All i see are examples of above.

    thanks a bunch

  19. purnima says:

    how to upload content from csv in case of map,set,list and UDT using COPY FROM command?


Your email address will not be published. Required fields are marked *

Subscribe for newsletter:

Tel. +1 (408) 933-3120 sales@datastax.com Offices France Germany

DataStax Enterprise is powered by the best distribution of Apache Cassandra™.

© 2017 DataStax, All Rights Reserved. DataStax, Titan, and TitanDB are registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache Cassandra, Apache, Tomcat, Lucene, Solr, Hadoop, Spark, TinkerPop, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.