DataStax Developer Blog

Schema in Cassandra 1.1

By Jonathan Ellis -  February 15, 2012 | 23 Comments

The evolution of schema in Cassandra

When Cassandra was first released several years ago, it followed closely the data model outlined in Google’s Bigtable paper (with the notable addition of SuperColumns — more on these later): ColumnFamilies grouping related columns needed to be defined up-front, but column names were just byte arrays interpreted by the application. It would be fair to characterize this early Cassandra data model as “schemaless.”

However, as systems deployed on Cassandra grew and matured, lack of schema became a pain point. When multiple teams are using the same data, it’s very useful to be able to ask “what data is in this table (or columnfamily),” without diving into the source of the code that uses it. And as more codebases share a database, it also becomes more useful to have the database validate that the birth_date column in one row is always an integer.

So, starting with the 0.7 release roughly a year ago, Cassandra has first allowed, then encouraged telling Cassandra about your data types. I’ve taken to describing Cassandra as “Schema-optional:” it’s not required, and you can ignore it at first then go back and add it later if you’d rather, but it’s a good habit to get into. Today, doing this in CQL looks familiar:


CREATE TABLE users (
    id uuid PRIMARY KEY,
    name varchar,
    state varchar
);

ALTER TABLE users ADD birth_date INT;

(Using UUIDs as a surrogate key is common in Cassandra, so that you don’t need to worry about sequence or autoincrement synchronization across multiple machines.)

The best of both worlds

Superficially it may sound like Cassandra is headed back where relational databases started: every column predefined and typed. The big difference is in the practical limitations of Cassandra’s log-structured merge-tree storage engine, compared to RDBMS b-trees.

Without going into too much detail, traditional storage engines allocate room for each column in each row, up front. (Rows that have different sets of columns are grudgingly accomodated via nulls.)

In a static-column storage engine, each row must reserve space for every column

In Cassandra’s storage engine, each row is sparse: for a given row, we store only the columns present in that row. Technically this implies that we store the column names redundantly in each row, trading disk space to gain flexibility. Thus, adding columns to a Cassandra table always only takes a few milliseconds, rather than growing from minutes to hours or even weeks as data is added to the table with a storage engine that needs to re-allocate space row by row to accommodate the new data.

 

In a sparse-column engine, space is only used by columns present in each row

This also means that Cassandra can easily support thousands of columns per table, without wasting space if each row only needs a few of them.

Thus, Cassandra gives you the flexibility normally associated with schemaless systems, while also delivering the benefits of having a defined schema.

Clustering, compound keys, and more

Starting in the upcoming Cassandra 1.1 release, CQL (the Cassandra Query Language) supports defining columnfamilies with compound primary keys. The first column in a compound key definition continues to be used as the partition key, and remaining columns are automatically clustered: that is, all the rows sharing a given partition key will be sorted by the remaining components of the primary key.

For example, consider the sblocks table in the CassandraFS data model:


CREATE TABLE sblocks (
    block_id uuid,
    subblock_id uuid,
    data blob,
    PRIMARY KEY (block_id, subblock_id)
)
WITH COMPACT STORAGE;

The first element of the primary key, block_id, is the partition key, which means that all subblocks of a given block will be routed to the same replicas. For each block, subblocks are also ordered by the subblock id. DataStax Enterprise uses this property to make sure that SELECT data FROM sblocks WHERE block_id = ? is sequential i/o in subblock_id order.

Compound keys can also be useful when denormalizing data for faster queries. Consider a Twitter data model like Twissandra’s. We have tweet data:


CREATE TABLE tweets (
    tweet_id uuid PRIMARY KEY,
    author varchar,
    body varchar
);

But the most frequent query (“show me the 20 most recent tweets from people I follow”) would be expensive against a normalized model. So we denormalize into another table:


CREATE TABLE timeline (
    user_id varchar,
    tweet_id uuid,
    author varchar,
    body varchar,
    PRIMARY KEY (user_id, tweet_id)
);

That is, any time a given author makes a tweet, we look up who follows him, and insert a copy of the tweet into the followers’ timeline. Cassandra orders version 1 UUIDs by their time component, so SELECT * FROM timeline WHERE user_id = ? ORDER BY tweet_id DESC LIMIT 20 requires no sort at query time.

(At the time of this writing, ORDER BY syntax is being finalized; this is my best guess as to what it will look like.)

Under the hood and historical notes

Cassandra’s storage engine uses composite columns under the hood to store clustered rows. This means that all the logical rows with the same partition key get stored as a single physical “wide row.” This is why Cassandra supports up to 2 billion columns per (physical) row, and why Cassandra’s old Thrift api has methods to take “slices” of such rows.

To illustrate this, let’s consider three tweets for our timeline data model above:

Raw tweet data

We’ll have timeline entries for jadams, who follows gwashington and jmadison, and ahamilton, who follows gwashington and gmason. I’ve colored these rows by their partition key, the user_id:

Logical representation of the denormalized timeline rows

The physical layout of this data looks like this to Cassandra’s storage engine:

Physical representation of the denormalized timeline rows

 

The  WITH COMPACT STORAGE directive is provided for backwards compatibility with older Cassandra applications, as in the CassandraFS example above; new applications should avoid it. Using COMPACT STORAGE will prevent you from adding new columns that are not part of the PRIMARY KEY. With COMPACT STORAGE, each logical row corresponds to exactly one physical column:

Physical representation of the denormalized timeline rows, WITH COMPACT STORAGE

SuperColumns were an early attempt at providing the same kinds of denormalization tools discussed above. They have important limitations (e.g., reading any subcolumn from a SuperColumn pulls the entire SuperColumn into memory) and will eventually be replaced by a composite column implementation with the same API. So if you have an application using SuperColumns, you don’t need to rewrite anything, but if you are starting fresh, you should use the more flexible approach described above.



Comments

  1. Hi Jonathan Ellis,

    Thanks for sharing, composite keys is a great new feature.
    I’d like to see more data model types in future Cassandra release(s).

    Will 1.1 support 2nd composite column index? or should I create a 2nd column family to be the index, to workaround ?

    Thanks,
    Charlie | DBA

  2. Jonathan Ellis says:

    Indexes on composite columns will be supported (https://issues.apache.org/jira/browse/CASSANDRA-3782, https://issues.apache.org/jira/browse/CASSANDRA-3680), but probably not for 1.1.0.

  3. Punchy says:

    [re adding columns in Cassandra vs. "hours and weeks" for other systems]

    There is no reason why an RDBMs needs to touch data on an add column. MySQL just happens to have an embarrassingly bad implementation. You’d be hard pressed to find another widely used RDBMs, commercial or open source, that is implemented this way. So the issue of how long an add column may take isn’t a good reason in and of itself for using a more loosely defined schema like Cassandra instead of an RDBMs.

    Related topic: does Cassandra touch the data when implementing the equivalent of ‘drop column’?

    Thanks. I appreciate the postings and keep up the good work.

  4. Jonathan Ellis says:

    Reclaiming of space post-drop will be done during compaction, in the background.

  5. Blair Zajac says:

    There’s a typo in the “Logical representation of the denormalized timeline rows” and successive tables, the user_id column should be “jadams”, since it would be odd for “jmadison” to follow himself.

  6. Jonas says:

    In our last test with composite keys vs. super columns, our data set’s size requirements grew roughly by the size of the “super-key” multiplied by the number of rows. In our case this means about up to 99 x 32 bytes per super column entry, which translates into much increased storage costs per entry.

    Can you comment on whether this implementation will or (perhaps already has?) become more efficient before you replace super columns? Or is there a design pattern that we might not know about that would allow us to keep the size down?

    Keep up the good work!

  7. Jonathan Ellis says:

    @Blair: oops, you’re right. Sorry!

    @Jonas: We’re consciously trading the cheapest resource (disk space) to get more flexibility. In the meantime, compression will be enabled by default starting with 1.1, which should more than make up for it.

  8. Can we have more than 3 columns in the composite primary key?

    E.g.

    CREATE columnfamily emp (
    dept_no int,
    job varchar2,
    emp_no int,
    name varchar,
    state varchar,
    PRIMARY KEY (dept_no, job, emp_no)
    );

  9. Jonathan Ellis says:

    You can, although just using that for straight-across ports of relational models will probably not yield the desired results.

  10. Shahryar says:

    In question # 8 if we insert dep_no=123, job=programer, emp_no= 9876, name=John and state=CA. Then I assume it will create 2 Composite columns with row key = 123 as follows:
    programer:9876:name=John
    programer:9876:state=CA

    Correct?

  11. Carlos says:

    Is pycassa compatible with composite primary keys?

  12. Hello, I downloaded and am playing around with 1.1. However, when I run cassandra-cli, and try to do create TABLE above, it does not let me use syntax ‘TABLE’. I do see the CLI version 1.1.0. Also, when I replace it with column family, then I get different error as to missing EOF at ‘(‘.

  13. Desmond says:

    I have a question on Composite Key and Secondary Index.

    I am using the Command Line tool, cassandra-cli and the cqlsh. I have run into a problem trying to create composite key in both.

    I want to create a composite column of (City, State) and have a list of hotels for a given [:].

    –script.txt for cassandra-cli begin—————-

    create keyspace TEST;
    use TEST;

    create column family Hotels with
    comparator = ‘CompositeType(UTF8Type, UTF8Type)’
    and key_validation_class = ‘UTF8Type’
    and default_validation_class = ‘UTF8Type’;

    assume Hotels keys as utf8;
    assume Hotels comparator as utf8;
    assume Hotels validator as utf8;

    set Hotels['AZC_043']['Phoenix:AZ']=’Cambria Hotel’;

    –script.txt end—————-

    I get an error on the last line:
    >>>Line 7 => Not enough bytes to read value of component 0
    >>>InvalidRequestException(why:Not enough bytes to read value of component 0)

    Can someone let me know where I am going wrong on the last line?

    After going through a lot of documents on the web, I could not find an example that illustrates the use of composite key via the command line. If anyone has a pointer, please do let me know. Thanks.

  14. the paul says:

    Sunit- Cassandra-cli does not use or accept CQL input. You want cqlsh instead.

  15. cyril says:

    @Shahryar
    you will create 1 column, with 2 rows in physical rep, or 1 in logical representation as explained

  16. Vyjayanthi says:

    I am using cassandra 1.1.1. But, whenever I create a column family with composite keys, I will get an error as below,

    java.sql.SQLSyntaxErrorException: line 1:117 mismatched input ‘)’ expecting EOF at org.apache.cassandra.cql.jdbc.CassandraStatement.doExecute(CassandraStatement.java:180) at org.apache.cassandra.cql.jdbc.CassandraStatement.executeUpdate(CassandraStatement.java:237) at jdbc.cassandra.JdbcCassandra.main(JdbcCassandra.java:59)

    My query is “CREATE TABLE sblocks (block_id uuid, subblock_id uuid, data blob, PRIMARY KEY (block_id, subblock_id));”

    Can some body help me out?

  17. 1) I am quite unconfortable with the storage model of

    CREATE TABLE timeline (
    user_id varchar,
    tweet_id uuid,
    author varchar,
    body varchar,
    PRIMARY KEY (user_id, tweet_id)
    );

    With a “create column family” instruction, I could create the storage model of the table ‘timeline’. And the storage I define with a “create column family” instruction is expected to be exactly the storage model.

    But, with the “create table” instruction, I don’t see, at first glance, that the 2nd part of the composite row key is going to be repeated for each column… I see this as a drawback that hides some details, and that hurts performance forecast.

    2) After having read this post, I wonder if, for a “create column family” instruction with a composite row key (sk1, sk2), the same storage model will be applied like for “table timeline”: so, I wonder if there will be one row with rowkey=sk1, and sk2 will be used in a Composite column name.

    So, if I am using a “create column family” instruction, is the CF defined still exactly corresponding to the storage model, or is the “trick” behind “create table” also applied here (and producing an altered storage model) ?

    3) is it possible to create through “create column family” a CF with a composite rowkey (sk1, sk2) such that:
    - sk1 would be the partition key
    - but every row with (sk1, sk2) would give a different physical row, so that sk2 would not be repeated for each column ?

    Thanks.

  18. Niek Sanders says:

    @Vyjayanthi Just spent two hours figuring out the same issue. (Documentation fail!) You need to start cqlsh with the -3 flag otherwise it will use the older CQL2 protocol.

  19. Ken Krugler says:

    I think the table titled “Logical representation of the denormalized timeline rows” has an error, where the first two rows should have user_id = “jadams”, not “jmadison”. The same holds true for the following two tables, where “jmadison” in the first column should be “jadams”.

  20. johnyjj2 says:

    Hello,

    how can I assure that primary key is auto incremented?

    I create table as follows:
    CREATE TABLE TableName (KEY uuid PRIMARY KEY, testFromDate timestamp)
    (It can be created, even if I don’t have semicolon).

    If I change it to CREATE TABLE TableName (KEY uuid PRIMARY KEY AUTO_INCREMENT, testFromDate timestamp)
    I get exception with ‘why’ = mismatched input ‘AUTO_INCREMENT’ expecting ‘)’

    I insert new row as follows:
    INSERT INTO TableName (testFromDate) values (’2012-01-01 00:00:00′);

    If I use first way (without auto_increment) I get:
    Expected key ‘KEY’ to be present in WHERE clause for ‘TableName’

    I have read that perhaps I need to force using CQL 3.0. However, I use cassandraemon to connect to Apache Cassandra from C#.NET.

    I use CQL query inside:
    using (var context = new CassandraContext(“localhost”, 9160, keyName))
    where keyName is string with my keyspace name that already exists. And I don’t see any way to force CQL 3.0 (if this is really what I need to do).

    How can I insert data correctly with INSERT statement?

    Best regards!

  21. Jonathan Ellis says:

    Cassandra doesn’t support auto-incremented primary keys, because in a distributed system the increment would, in the best case, require coordination across the cluster and cause poor performance, and in the worst case cause unavailability when peers go down.

    Instead, we recommend using UUIDs (which can be uniquely generated without coordination) or natural keys.

  22. shruti says:

    I am trying to create a table with Primary Key . and using DSE-2.2 . I am getting
    Bad Request: line 1:135 mismatched input ‘)’ expecting EOF

    on CQL
    Though as mentioned in some posts I have used cql v3 .

    can anyone provide some pointers ?

  23. Cassandra Schema is some what helping my Compression compound Theory

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>