email iconemail phone iconcall

Ways to Move Data To/From DataStax Enterprise and Cassandra

By Robin Schumacher, SVP and Chief Product Officer -  November 26, 2012 | 7 Comments

One of the most asked questions we get at DataStax is, “How can I move data from other sources to DataStax Enterprise and Cassandra, and vice versa?” I thought I’d quickly outline the top three options that you have available.

COPY command

Cassandra 1.1 and higher supplies the COPY command, which mirrors what the PostgreSQL RDBMS uses for file/export import. The utility is used in Cassandra’s CQL shell, and allows for flat file data to be loaded into Cassandra (nearly all RDBMS's have unload utilities that allow table data to be written to OS files) as well as data to be written out to OS files. A variety of file formats and delimiters are supported including comma-separated value (CSV), tabs, and more, with CSV being the default.

The syntax for the COPY command is the following:

COPY <column family name> [ ( column [, ...] ) ]   FROM ( '<filename>' | STDIN )   [ WITH <option>='value' [AND ...] ];

COPY <column family name> [ ( column [, ...] ) ]   TO ( '<filename>' | STDOUT )   [ WITH <option>='value' [AND ...] ];

Below are simple examples of the COPY command in action:

cqlsh> SELECT * FROM airplanes;   
name          | mach | manufacturer | year 
P38-Lightning |  0.7 |     Lockheed | 1937

cqlsh> COPY airplanes (name, mach, year, manufacturer) TO 'temp.csv'
1 rows exported in 0.004 seconds. 
cqlsh> TRUNCATE airplanes;  

cqlsh> COPY airplanes (name, manufacturer, year, mach) FROM 'temp.csv'; 1 rows imported in 0.087 seconds.

See our online documentation for more information about the COPY command.


DataStax Enterprise Edition 2.0 and higher includes support for Sqoop, which is a tool designed to transfer data between an RDBMS and Hadoop. We modified Sqoop so you can not only transfer data from an RDBMS to a Hadoop node in a DataStax Enterprise cluster, but also move data directly into Cassandra as well.

I wrote a short tutorial you can reference on how to use Sqoop to move data from MySQL into Cassandra that can be mimicked for any other RDBMS. There’s also a demo of Sqoop that you’ll find in the /demos directory of a DataStax Enterprise download/installation.

ETL Tools

If you need more sophistication applied to a data movement situation (i.e. more than just extract-load), then you can use any number of extract-transform-load (ETL) solutions that now support Cassandra. These tools provide excellent transformation routines that allow you to manipulate source data in literally any way you need and then load it into a Cassandra target. They also supply many other features such as visual, point-and-click interfaces, scheduling engines, and more.

Happily, many ETL vendors who support Cassandra supply community editions of their products that are free and able to solve many different use cases. Enterprise editions are also available that supply many other compelling features that serious enterprise data users need.

You can freely download and try ETL tools from Jaspersoft, Pentaho, and Talend that all work with DataStax Enterprise and community Cassandra.


The good news is you have multiple methods to use for moving data from most any database source system into DataStax Enterprise and Cassandra. If there’s another ETL tool or similar product you’re using that doesn’t support DataStax Enterprise and/or Cassandra, please contact us and we’ll see what we can do to change that.

If you haven’t done so already, download a copy of DataStax Enterprise Edition, which contains a production-ready version of Cassandra, along with Hadoop for analytics and Solr for enterprise search. It’s completely free to use for as long as you like in your development environments.


DataStax has many ways for you to advance in your career and knowledge.

You can take free classes, get certified, or read one of our many white papers.

register for classes

get certified

DBA's Guide to NoSQL


  1. Amit says:

    Hello Robin,

    Thanks for this information. We are looking at Cassandra to be used as a cache that loads up data from own storage or external data sources. The external data sources complicate the matter as we may not know what could be the keys that would relevant for our clients on a particular day. So we need to load the data into Cassandra by contacting the external data source in cases where the key is not present or a time interval has passed after previous load
    Is there any mechanism to do this?

  2. Amarnadh Jagu says:

    I am alos looking for how should I make it sync the RDBM data relationships in CASSANDRA.

    I just want to use the Cassandra to store my bussiness data without distrubing the relationships between the primary tables and child tables.

  3. john says:

    How can we do copy to CSV if I have custom data types Ex: address with streetname and zip

  4. Boreo says:

    Hii….I want to insert csv file to cassandra table which might contains Null data. I trieds but it gives error like line contains NULL byte
    %d format: a number is required, not NoneType…how should i declare it using Copy options?

  5. Challa says:

    using apache sqoop, is there a way to directly transfer data from RDBMS (e,g, Oracle to Cassandra) to Apache Cassandra(not using DSE), instead of a two steps process like RDBMS to HDFS and then HDFS to Cassandra

  6. GG says:

    How do we move Scylla db data into CASSANDRA?

  7. Richard says:

    hello. thanks for your info.
    by the way, I just wonder about performance when the whole data is copied to CSV files by “COPY”command, Is it safe to use it when the service lives on?


Your email address will not be published. Required fields are marked *

Subscribe for newsletter: