Apache Cassandra 0.7 Documentation

Data Model

This document corresponds to an earlier product version. Make sure you are using the version that corresponds to your version.

Latest Cassandra documentation | Earlier Cassandra documentation

The Cassandra data model is designed for distributed data on a very large scale. It trades ACID-compliant data practices for important advantages in performance, availability, and operational manageability.

Some newcomers to Cassandra approach the data model with analogies to relational models – for instance, equating column families with tables, keyspaces with databases, and so on. Experts often prefer to stress the important differences between the two models, such as Cassandra’s flexibility in adding columns without the need to alter a table structure, or the benefits of eventual consistency over row-level locking.

In Cassandra, denormalization is the norm. A standard and very efficient way of working with the Cassandra data model is to create one column family for each expected type of query. With this approach, data is denormalized and structured so that one or multiple rows in a single column family are used to answer each query.

The rest of this section provides specific information about working with the basics of the Cassandra data model such as keyspaces, row keys, column families, super columns and indexes.