Apache Cassandra 0.6 Documentation

Using Column Families as Indexes

A common data storage strategy is to use one column family to store data and one or more column families to serve as indexes for the data.

The strategy used for indexing data should depend heavily on the type of data being indexed. Keep in mind that each row in an index is stored on only a single node, so that node may experience greater-than-average load if a single-row index is used heavily. Some of the possible indexing strategies are listed below.

One-to-One

  • One indexed value matches one data row key
  • Each index is a single row with one column per indexed data row
  • Index row key is the name of your index
  • Index column names are the values being indexed
  • Index column values are the row keys being indexed

One-to-Several

  • One indexed value matches several row keys
  • Each index is a single row with one super column per indexed value
  • Index row key is the name of your index
  • Index super column names are the values being indexed
  • Index sub-column names are the row keys being indexed
  • There is no sub-column value

One-to-Many

  • One indexed value matches many row keys
  • Each indexed value receives its own row
  • Each index row key matches the column value of the rows being indexed
  • Column names may be:
    • The row keys, if no other ordering is needed
    • A value used for ordering the keys; the column values will contain the row keys.
  • One column family for each column name being indexed