Using Column Families as Indexes
A common data storage strategy is to use one column family to store
data and one or more column families to serve as indexes for the data.
The strategy used for indexing data should depend heavily on the type
of data being indexed. Keep in mind that each row in an index is stored
on only a single node, so that node may experience greater-than-average
load if a single-row index is used heavily. Some of the possible indexing
strategies are listed below.
One-to-One
- One indexed value matches one data row key
- Each index is a single row with one column per indexed data row
- Index row key is the name of your index
- Index column names are the values being indexed
- Index column values are the row keys being indexed
One-to-Several
- One indexed value matches several row keys
- Each index is a single row with one super column
per indexed value
- Index row key is the name of your index
- Index super column names are the values being indexed
- Index sub-column names are the row keys being indexed
- There is no sub-column value
One-to-Many
- One indexed value matches many row keys
- Each indexed value receives its own row
- Each index row key matches the column value of the rows being indexed
- Column names may be:
- The row keys, if no other ordering is needed
- A value used for ordering the keys; the column values will contain the row keys.
- One column family for each column name being indexed