Apache Cassandra 1.1 Documentation

COPY

This document corresponds to an earlier product version. Make sure you are using the version that corresponds to your version.

Latest Cassandra documentation | Earlier Cassandra documentation

Imports and exports CSV (comma-separated values) data to and from Cassandra 1.1.3 and higher.

Synopsis

COPY <column family name> [ ( column [, ...] ) ]
  FROM ( '<filename>' | STDIN )
  [ WITH <option>='value' [AND ...] ];

COPY <column family name> [ ( column [, ...] ) ]
  TO ( '<filename>' | STDOUT )
  [ WITH <option>='value' [AND ...] ];

Description

Using the COPY options in a WITH clause, you can change the format of the CSV format. The following table describes these options:

COPY Options Default Value Use To
DELIMITER comma (,) Set the character that separates fields in the file.
QUOTE quotation mark(") Set the character that encloses field values. [1]
ESCAPE backslash (\) Set the character that escapes literal uses of the QUOTE character.
HEADER false Set true to indicate that first row of the file is a header.
ENCODING UTF8 Set the COPY TO command to output unicode strings.
NULL an empty string Set the COPY TO command to represent the absence of a value.
[1]Applies only to fields having newline characters.

The ENCODING and NULL options cannot be used in the COPY FROM command.

This table shows that, by default, Cassandra expects the CSV data to consist of fields separated by commas (,), records separated by line separators (a newline, \r\n), and field values enclosed in double-quotation marks (""). Also, to avoid ambiguity, escape a literal double-quotation mark using a backslash inside a string enclosed in double-quotation marks ("\""). By default, Cassandra does not expect the CSV file to have a header record on the first line that consists of the column names. COPY TO includes the header in the output if HEADER=true. COPY FROM ignores the first line if HEADER=true.

COPY FROM a CSV File

By default, when you use the COPY FROM command, Cassandra expects every row in the CSV input to contain the same number of columns. The number of columns in the CSV input is the same as the number of columns in the Cassandra table metadata. Cassandra assigns fields in the respective order. To apply your input data to a particular set of columns, specify the column names in parentheses after the table name.

Note

COPY FROM is intended for importing small datasets (a few million rows or less) into Cassandra. For importing larger datasets, use Cassandra Bulk Loader or the sstable2json / json2sstable utility.

COPY TO a CSV File

For example, assume you have the following table in CQL:

cqlsh> SELECT * FROM test.airplanes;

 name          | mach | manufacturer | year
---------------+------+--------------+------
 P38-Lightning |  0.7 |     Lockheed | 1937

After inserting data into the table, you can copy the data to a CSV file in another order by specifying the column names in parentheses after the table name:

COPY airplanes
(name, mach, year, manufacturer)
TO 'temp.csv'

Specifying the Source or Destination Files

Specify the source file of the CSV input or the destination file of the CSV output by a file path. Alternatively, you can use the STDIN or STDOUT keywords to import from standard input and export to standard output. When using stdin, signal the end of the CSV data with a backslash and period (“\.“) on a separate line. If the data is being imported into a column family that already contains data, COPY FROM does not truncate the column family beforehand.

Examples

Copy a Column Family to a CSV File

  1. Create a keyspace, and using the keyspace, create a column family named airplanes and copy it to a CSV file.

    CREATE TABLE airplanes (
      name text PRIMARY KEY,
      manufacturer ascii,
      year int,
      mach float
    );
    
    INSERT INTO airplanes
      (name, manufacturer, year, mach)
      VALUES ('P38-Lightning', 'Lockheed', 1937, '.7');
    
    COPY airplanes (name, manufacturer, year, mach) TO 'temp.csv';
    1 rows exported in 0.004 seconds.
    
  2. Clear the data from the airplanes column family and import the data from the temp.csv file

TRUNCATE airplanes;

COPY airplanes (name, manufacturer, year, mach) FROM 'temp.csv';
1 rows imported in 0.087 seconds.

Copy Data from Standard Input to a Column Family

  1. Enter data directly during an interactive cqlsh session, using the COPY command defaults.

    COPY airplanes (name, manufacturer, year, mach) FROM STDIN;
    
  2. At the [copy] prompt, enter the following data:

    "F-14D Super Tomcat", Grumman,"1987", "2.34"
    "MiG-23 Flogger", Russian-made, "1964", "2.35"
    "Su-27 Flanker", U.S.S.R.,"1981", "2.35"
    \.
    3 rows imported in 55.204 seconds.
  3. Query the airplanes column family to see data imported from STDIN:

    SELECT * FROM airplanes;
    

    Output

     name               | manufacturer | year | mach
    --------------------+------+--------------+-------------
     F-14D Super Tomcat |      Grumman | 1987 | 2.35
          P38-Lightning |     Lockheed | 1937 | 0.7
          Su-27 Flanker |     U.S.S.R. | 1981 | 2.35
         MiG-23 Flogger | Russian-made | 1967 | 2.35
    

CQL Commands

CQL Commands CQL Shell Commands
ALTER TABLE ASSUME
ALTER KEYSPACE CAPTURE
BATCH COPY
CREATE TABLE DESCRIBE
CREATE INDEX EXIT
CREATE KEYSPACE SHOW
DELETE SOURCE
DROP TABLE  
DROP INDEX  
DROP KEYSPACE  
INSERT  
SELECT  
TRUNCATE  
UPDATE  
USE