Moving data to or from other databases
DataStax offers several solutions for migrating from other databases:
- The COPY command, which mirrors what the PostgreSQL RDBMS uses for file/export import.
- Apache Sqoop, which is a tool designed to transfer data between an RDBMS and Hadoop. DataStax Enterprise modified Sqoop so you can not only transfer data from an RDBMS to a Hadoop node in a DataStax Enterprise cluster, but also move data directly into Cassandra as well.
- The DSE Search/Solr Data Import Handler, which is a configuration-driven method for importing data to be indexed for searching.
- The Cassandra bulk loader provides the ability to bulk load external data into a cluster.
About the COPY command¶
You can use COPY in Cassandra’s CQL shell to load flat file data into Cassandra (nearly all RDBMS’s have unload utilities that allow table data to be written to OS files) as well as data to be written out to OS files.
If you need more sophistication applied to a data movement situation (more than just extract-load), then you can use any number of extract-transform-load (ETL) solutions that now support Cassandra. These tools provide excellent transformation routines that allow you to manipulate source data in literally any way you need and then load it into a Cassandra target. They also supply many other features such as visual, point-and-click interfaces, scheduling engines, and more.
Many ETL vendors who support Cassandra supply community editions of their products that are free and able to solve many different use cases. Enterprise editions are also available that supply many other compelling features that serious enterprise data users need.
You can freely download and try ETL tools from Jaspersoft, Pentaho, and Talend that all work with DataStax Enterprise and community Cassandra.