DataStax Developer Blog

What's new in Cassandra 1.1: Flexible data file placement

By Yuki Morishita -  April 6, 2012 | 1 Comment

Apache Cassandra is designed from the ground up to work well on spinning disks, but it can also leverage the high IOPS of SSDs. (Don’t miss the video and slides about using Cassandra with SSDs from our solutions architect.)

Suppose you have a column family whose data is written once and read infrequently (named “Logs”), and one whose data is accessed frequently (named “UserData”) under the same keyspace named “App”. You may want to use an SSD for the frequently accessed column family in order to boost IO performance. At first, it looks like you can achieve this by mounting the SSD to an appropriate data directory, but then you realize that Cassandra stores all column family data files under a single directory for their keyspace, like below:

/var/lib/cassandra/data/App/Logs-hc-1-Data.db
/var/lib/cassandra/data/App/Logs-hc-1-Index.db
...
/var/lib/cassandra/data/App/UserData-hc-1-Data.db
/var/lib/cassandra/data/App/UsreData-hc-1-Index.db
...

Until now, you can only use a separate disk per keyspace, not per column family.

More control over data files

In version 1.1, CASSANDRA-2749 changes the way Cassandra stores data files by using separate column family directories within each keyspace directory. In 1.1, the above data files will instead be stored like this:

/var/lib/cassandra/data/App/Logs/App-Logs-hc-1-Data.db
/var/lib/cassandra/data/App/Logs/App-Logs-hc-1-Index.db
...
/var/lib/cassandra/data/App/UserData/App-UserData-hc-1-Data.db
/var/lib/cassandra/data/App/UserData/App-UserData-hc-1-Index.db
...

This allows you to mount an SSD on a particular directory (in this case UserData) to boost the performance for a particular column family. You may notice that the file name format has also been changed to include the keyspace name at the beginning. This makes it easy to distinguish which keyspace and column family the file belongs when streaming or bulk loading.

What about upgrading?

Do you need to manually move all pre-1.1 data files to the new directory structure before upgrading to 1.1? No. Immediately after Cassandra 1.1 starts, it checks to see whether it has old directory structure and migrates all data files (including backups and snapshots) to the new directory structure if needed. So, just upgrade as you always do (don’t forget to read NEWS.txt first), and you will get more control over data files for free.

Conclusion

Starting with Cassandra 1.1, data files are stored inside their own column family directory, which enables you to control what column family goes to which disk. Upgrading to the new directory structure is done automatically, so no extra upgrade steps are required. The beta2 version of 1.1 is available for download, so feel free to try it out! Feedback is always appreciated.



Comments

  1. Ankur says:

    Greate.But my interest is to know whether Cassandra supports Spatial queries or not.
    It will be greate if I get any sample tutorial or java codebase of how Cassandra(if supports) is supporting spatial data.

    Thanks and Regards,
    Ankur.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>