DataStax Enterprise 2.1 Documentation

About Sqoop

This documentation corresponds to an earlier product version. Make sure this document corresponds to your version.

Latest DSE documentation | Earlier DSE documentation

Sqoop is an Apache Software Foundation tool for transferring data between an RDBMS data source and Hadoop or between other data sources, such as NoSQL.

DataStax Enterprise support for Sqoop empowers you to import data from an external data source to Hadoop, Hive, or Cassandra column families. A DSE node runs the Hadoop/Analytics workload, and the Hadoop job imports data from a data source using Sqoop.

You can import data from any JDBC-compliant data source. For example:

  • DB2
  • MySQL
  • Oracle
  • SQL Server
  • Sybase

You need a JDBC driver for the RDBMS or other type of data source.

Getting Started

To get started using Sqoop, first run the Sqoop demo to migrate data from a MySQL table to text files in the Cassandra File System (CFS).

Next, take a look at how to expand the basic dse sqoop import command used by the demo to migrate data to a Cassandra column family.

Finally, glance at the extent of the Sqoop commands listed in the online help and the Cassandra additions.