Apache Cassandra 1.1 Documentation

SELECT

This document corresponds to an earlier product version. Make sure you are using the version that corresponds to your version.

Latest Cassandra documentation | Earlier Cassandra documentation

Retrieves data, including Solr data, from a Cassandra column family.

Synopsis

SELECT <select expression>
 FROM <column family>
 [USING CONSISTENCY <level>]
 [WHERE (<clause>) [AND (<clause>) ...]] [LIMIT <n>]
 [ORDER BY <composite key 2>] [ASC, DESC]

<clause> syntax is:

<relation> [<AND relation> ...]
<primary key name> { = | < | > | <= | >= } <key_value>
<primary key name> IN (<key_value> [,...])

Description

A SELECT expression reads one or more records from a Cassandra column family and returns a result-set of rows. Each row consists of a row key and a collection of columns corresponding to the query.

Unlike the projection in a SQL SELECT, there is no guarantee that the results will contain all of the columns specified because Cassandra is schema-optional. An error does not occur if you request non-existent columns.

Examples

Specifying Columns

The SELECT expression determines which columns, if any, appear in the result:

SELECT * from People;

Select two columns, Name and Occupation, from three rows having employee ids (primary key) 199, 200, or 207:

SELECT Name, Occupation FROM People WHERE empID IN (199, 200, 207);

A simple form is a comma-separated list of column names. The list can consist of a range of column names.

Counting Returned Rows

A SELECT expression using COUNT(*) returns the number of rows that matched the query. Alternatively, you can use COUNT(1) to get the same result.

Count the number of rows in the users column family:

SELECT COUNT(*) FROM users;

If you do not specify a limit, a maximum of 10,000 rows are returned by default. Using the LIMIT option, you can specify that the query return a greater or fewer number of rows.

SELECT COUNT(*) FROM big_columnfamily;

 count
-------
 10000

SELECT COUNT(*) FROM big_columnfamily LIMIT 50000;

 count
-------
 50000

SELECT COUNT(*) FROM big_columnfamily LIMIT 200000;

 count
--------
 105291

Specifying the Column Family, FROM, Clause

The FROM clause specifies the column family to query. Optionally, specify a keyspace for the column family followed by a period, (.), then the column family name. If a keyspace is not specified, the current keyspace will be used.

Count the number of rows in the Migrations column family in the system keyspace:

SELECT COUNT(*) FROM system.Migrations;

Specifying a Consistency Level

You can optionally specify a consistency level, such as QUORUM:

SELECT * from People USING CONSISTENCY QUORUM;

See tunable consistency for more information about the consistency levels.

Filtering Data Using the WHERE Clause

The WHERE clause filters the rows that appear in the results. You can filter on a key name, a range of keys, or on column values if columns have a secondary index. Row keys are specified using the KEY keyword or key alias defined on the column family, followed by a relational operator, and then a value.

Relational operators are: =, >, >=, <, or <=.

To filter a indexed column, the term on the left of the operator must be the name of the column, and the term on the right must be the value to filter on.

In CQL2, the greater-than and less-than operators (> and <) result in key ranges that are inclusive of the terms. There is no supported notion of strictly greater-than or less-than; these operators are merely supported as aliases to >= and <=. In CQL3, this is no longer the case. Strict bounds are respected in CQL3. For example:

CREATE TABLE scores (
  name text,
  score int,
  date timestamp,
  PRIMARY KEY (username, score)
);

INSERT INTO scores (name, score, date) VALUES ('bob', 42, '2012-06-24');
INSERT INTO scores (name, score, date) VALUES ('bob', 47, '2012-06-25');
INSERT INTO scores (name, score, date) VALUES ('bob', 33, '2012-06-26');
INSERT INTO scores (name, score, date) VALUES ('bob', 40, '2012-06-27');

SELECT date, score FROM scores WHERE name='bob' AND score >= 40;

  date                     | score
--------------------------+-------
 2012-06-27 00:00:00+0000 |    40
 2012-06-24 00:00:00+0000 |    42
 2012-06-25 00:00:00+0000 |    47

SELECT date, score FROM scores WHERE name='bob' AND score > 40;

  date                     | score
--------------------------+-------
 2011-06-24 00:00:00+0000 |    42
 2011-06-25 00:00:00+0000 |    47

WHERE clauses can include comparisons on columns other than the first. As long as all previous key-component columns have already been identified with strict = comparisons, the last given key component column can be any sort of comparison.

Sorting Filtered Data

ORDER BY clauses can only select a single column, and that column has to be the second column in a composite PRIMARY KEY. This holds even for tables with more than 2 column components in the primary key. Ordering can be done in ascending or descending order, default ascending, and specified with the ASC or DESC keywords.

The column you ORDER BY has to be part of a composite primary key.

SELECT * FROM emp where empid IN (103, 100) ORDER BY deptid ASC;

This query returns:

 empid | deptid | first_name | last_name
-------+--------+------------+-----------
   103 |     11 |    charlie |     brown
   100 |     10 |       john |       doe
SELECT * FROM emp where empid IN (103, 100) ORDER BY deptid DESC;

This query returns:

 empid | deptid | first_name | last_name
-------+--------+------------+-----------
   100 |     10 |       john |       doe
   103 |     11 |    charlie |     brown

Specifying Rows Returned Using LIMIT

By default, a query returns 10,000 rows maximum. Using the LIMIT clause, you can change the default limit of 10,000 rows to a lesser or greater number of rows. The default is 10,000 rows.

SELECT * from emp where deptid=10 LIMIT 900;

Querying System Tables

In CQL3, you can query system tables directly to obtain cluster schema information. For example:

USE my_ks;

SELECT * FROM system.schema_keyspaces;

Output

 keyspace|durable_writes|name|strategy_class                             |strategy_options
---------+----------------+------+---------------------------------------------+----------------------
    my_ks|          True|test|org.apache.cassandra.locator.SimpleStrategy|{"replication_factor":"1"}

Examples

Captain Reynolds keeps track of every ship registered by his sensors as he flies through space. Each day-code is a Cassandra row, and events are added in with timestamps.

CREATE TABLE seen_ships (
  day text,
  time_seen timestamp,
  shipname text,
  PRIMARY KEY (day, time_seen));

SELECT * FROM seen_ships
  WHERE day='199-A/4'
  AND time_seen > '7943-02-03'
  AND time_seen < '7943-02-28'
  LIMIT 12;

Set up the tweets and timeline tables described in the Composite Columns section, insert the example data, and use this query to get the tweets of a George Mason's followers.

SELECT * FROM timeline WHERE user_id = gmason
  ORDER BY tweet_id DESC;

Output

user_id | tweet_id                           | author      | body
--------+------------------------------------+-------------+---------------------------------
gmason  |148e9150-1dd2-11b2-0000-242d50cf1fbf|      phenry | Give me liberty or give me death
gmason  |148b0ee0-1dd2-11b2-0000-242d50cf1fbf| gwashington | I chopped down the cherry tree

As of Cassandra 1.1.3, you can also use the IN operator, as shown in this example, to get the tweets of George Mason and Alexander Hamilton's followers:

SELECT * FROM timeline WHERE user_id IN ('gmason', 'ahamilton') ORDER BY tweet_id;

Output

 user_id | tweet_id                           | author      | body
---------+------------------------------------+-------------+-----------------------------------------
   gmason|148b0ee0-1dd2-11b2-0000-e547f5a51eff| gwashington | I chopped down the cherry tree
ahamilton|148b0ee0-1dd2-11b2-0000-e547f5a51eff| gwashington | I chopped down the cherry tree
   gmason|148e9150-1dd2-11b2-0000-e547f5a51eff|      phenry | Give me liberty, or give me death
ahamilton|14937350-1dd2-11b2-0000-e547f5a51eff|      jadams | A government of laws, not men

CQL Commands

CQL Commands CQL Shell Commands
ALTER TABLE ASSUME
ALTER KEYSPACE CAPTURE
BATCH COPY
CREATE TABLE DESCRIBE
CREATE INDEX EXIT
CREATE KEYSPACE SHOW
DELETE SOURCE
DROP TABLE  
DROP INDEX  
DROP KEYSPACE  
INSERT  
SELECT  
TRUNCATE  
UPDATE  
USE