DataStax Enterprise 3.1 Documentation

About Sqoop

This documentation corresponds to an earlier product version. Make sure this document corresponds to your version.

Latest DSE documentation | Earlier DSE documentation

Sqoop is an Apache Software Foundation tool for transferring data between an RDBMS data source and Hadoop or between other data sources, such as NoSQL.

DataStax Enterprise support for Sqoop empowers you to import data from an external data source to Hadoop, Hive, or Cassandra tables. A DSE node runs the Hadoop/Analytics workload, and the Hadoop job imports data from a data source using Sqoop.

Running the Sqoop demo

To get started using Sqoop, first run the Sqoop demo to import data from a MySQL table to text files in the Cassandra File System (CFS).

Importing data

You can import data from any JDBC-compliant data source. For example:

  • DB2
  • MySQL
  • Oracle
  • SQL Server
  • Sybase

You need a JDBC driver for the RDBMS or other type of data source.

Migrating data to a Cassandra table

After importing data into text files in Cassandra, take a look at how to expand the basic dse sqoop import command used by the demo to migrate data to a Cassandra table.

Finally, glance at the extent of the Sqoop commands listed in the online help and the Cassandra additions.

Getting information about the sqoop command

Use the help option of the sqoop import command to get online help on Sqoop command line options. For example, on the Mac:

cd <install_location>/bin

./dse sqoop import --help