Apache Cassandra 1.2 Documentation

CREATE TABLE

Define a new table.

Synopsis

CREATE TABLE keyspace_name.table_name
( column_definition, column_definition, ...)
WITH property AND property ...

column_definition is:

column_name cql_type
| column_name cql_type PRIMARY KEY
| PRIMARY KEY ( partition_key )
| column_name collection_type

cql_type is a type, other than a collection or counter type type, listed in CQL data types. Exceptions: ADD supports a collection type and also, if the table is a counter, a counter type.

partition_key is:

column_name
| ( column_name1, column_name2, column_name3 ...)
| ((column_name1́, column_name2́), column_name3́,
  column_name4́ . . .)

column_name1 is the partition key. column_name2, column_name3 ... are clustering keys. column_name1́, column_name2́ are partitioning keys. column_name3́, column_name4́... are clustering keys.

collection_type is:

LIST <cql_type>
| SET <cql_type>
| MAP <cql_type, cql_type>

property is a one of the CQL table storage options or a directive. A directive is either:

Synopsis legend

Description

CREATE TABLE creates a new table under the current keyspace. You can also use the alias CREATE COLUMNFAMILY. Valid table names are strings of alphanumeric characters and underscores, which begin with a letter. If you add the keyspace name followed by a period to the name of the table, Cassandra creates the table in the specified keyspace, but does not change the current keyspace; otherwise, if you do not use a keyspace name, Cassandra creates the table within the current keyspace.

Examples:

Defining a primary key column

The only schema information that must be defined for a table is the primary key (or row key) and its associated data type. Unlike earlier versions, CQL 3 does not require a column in the table that is not part of the primary key. A primary key can have any number (1 or more) of component columns.

If the primary key consists of only one column, you can use the keywords, PRIMARY KEY, after the column definition:

CREATE TABLE users (
  user_name varchar PRIMARY KEY,
  password varchar,
  gender varchar,
  session_token varchar,
  state varchar,
  birth_year bigint
);

Alternatively, you can declare the primary key consisting of only one column in the same way as you declare a compound primary key. Do not use a counter column for a key.

Using compound primary keys

Compound primary keys consist of more than one column. Use the keywords, PRIMARY KEY, followed by the comma-separated list of column names enclosed in parentheses.

CREATE TABLE emp (
  empID int,
  deptID int,
  first_name varchar,
  last_name varchar,
  PRIMARY KEY (empID, deptID)
);

Using composite partition keys

The first column declared in the PRIMARY KEY definition is the partition key, as discussed in Compound keys and clustering. Cassandra stores rows that share the partition key on the same physical node. You can declare a composite partition key formed of multiple columns by using an extra set of parentheses to define which columns form the compound partition key.

CREATE TABLE Cats (
  block_id uuid,
  breed text,
  color text,
  short_hair boolean,
  PRIMARY KEY ((block_id, breed), color, short_hair)
);

The composite partition key consists of block_id and breed. The clustering columns, color and short_hair, determine the clustering order of rows that facilitates retrieval.

Defining columns

You assign columns a type during table creation. Column types, other than collection-type columns, are specified as a parenthesized, comma-separated list of column name and type pairs. See CQL data types for the available types.

This example shows how to create a table that includes collection-type columns: map, set, and list.

CREATE TABLE users (
  userid text PRIMARY KEY,
  first_name text,
  last_name text,
  emails set<text>,
  top_scores list<int>,
  todo map<timestamp, text>
);

For information about using collections, see Using collections: set, list, and map.

Setting table options

Using the optional WITH clause and keyword arguments, you can control the configuration of a new table. You can use the WITH clause to specify the attributes of tables listed in CQL 3 table storage properties. For example:

CREATE TABLE MonkeyTypes (
  block_id uuid,
  species text,
  alias text,
  population varint,
  PRIMARY KEY (block_id)
)
WITH comment='Important biological records'
AND read_repair_chance = 1.0;
CREATE TABLE DogTypes (
  block_id uuid,
  species text,
  alias text,
  population varint,
  PRIMARY KEY (block_id)
) WITH compression =
    { 'sstable_compression' : 'DeflateCompressor', 'chunk_length_kb' : 64 }
  AND compaction =
    { 'class' : 'SizeTieredCompactionStrategy', 'min_threshold' : 6 };

You can specify using compact storage or clustering order using the WITH clause.

Using compact storage

When you create a table using compound primary keys, for every piece of data stored, the column name needs to be stored along with it. Instead of each non-primary key column being stored such that each column corresponds to one column on disk, an entire row is stored in a single column on disk. If you need to conserve disk space, use the WITH COMPACT STORAGE directive that stores data essentially the same as it was stored under CQL 2.

CREATE TABLE sblocks (
  block_id uuid,
  subblock_id uuid,
  data blob,
  PRIMARY KEY (block_id, subblock_id)
)
WITH COMPACT STORAGE;

Using the compact storage directive prevents you from adding more than one column that is not part of the PRIMARY KEY.

At this time, updates to data in a table created with compact storage are not allowed. The table with compact storage that uses a compound primary key must define at least one clustering column.

Unless you specify WITH COMPACT STORAGE, CQL creates a table with non-compact storage.

Using clustering order

You can order query results to make use of the on-disk sorting of columns. You can order results in ascending or descending order. The ascending order will be more efficient than descending. If you need results in descending order, you can specify a clustering order to store columns on disk in the reverse order of the default. Descending queries will then be faster than ascending ones.

The following example shows a table definition that changes the clustering order to descending by insertion time.

create table timeseries (
  event_type text,
  insertion_time timestamp,
  event blob,
  PRIMARY KEY (event_type, insertion_time)
)
WITH CLUSTERING ORDER BY (insertion_time DESC);