DataStax Enterprise 3.0 Documentation

Running the Sqoop demo

This documentation corresponds to an earlier product version. Make sure this document corresponds to your version.

Latest DSE documentation | Earlier DSE documentation

The Sqoop demo migrates the data from a MySQL table to text files in CFS. The Sqoop data migration demo uses the MySQL database and data from the North American Numbering Plan. This data consists of the area-code (NPA) and telephone number (Nxx) for the USA and Canada.

Demo requirements

To run the demo, you need:

  • An installation of JDK 1.6.x. The JRE alone will not work.
  • An installation of MySQL
  • Sufficient MySQL database privileges to create database objects
  • A JDBC driver for MySQL in the sqoop/lib directory
  • The connection string that is appropriate for the JDBC driver
  • One or more DSE nodes running the Analytics workload to run the Hadoop job that actually imports data from the external data source
  • A PATH environment variable that includes the bin directory of the DSE installation

Step-by-step procedure

To run the Sqoop demo on a single node on a Mac, for example, follow these steps.

  1. Check the JDK 1.6.x is installed.

  2. Install MySQL and download the JDBC driver for MySQL from the MySQL site. This example uses mysql-connector-java-5.0.8-bin.jar.

  3. Copy the JDBC driver for MySQL to the sqoop/lib directory.

    • RHEL or Debian installations

      /usr/share/dse-demos/sqoop/lib

    • Tar distribution, such as Mac

      <install_location>/demos/sqoop/lib

  4. Put the connector in a directory included in the Sqoop classpath, such as the resources/sqoop subdirectory of your DataStax Enterprise (DSE) installation.

  5. On the command line, start the MySQL daemon. For example:

    sudo ./mysqld_safe --user=mysql
    
  6. Start MySQL and create the demo database:

    sudo ./mysql
    
    mysql> CREATE DATABASE npa_nxx_demo;
    
  7. Then connect to the database and create the table:

    mysql> USE npa_nxx_demo;
    
    mysql> CREATE TABLE npa_nxx (
             npa_nxx_key varchar(16) NOT NULL,
             npa         varchar(3)  DEFAULT NULL,
             nxx         varchar(3)  DEFAULT NULL,
             lat         varchar(8)  DEFAULT NULL,
             lon         varchar(8)  DEFAULT NULL,
             linetype    varchar(1)  DEFAULT NULL,
             state       varchar(2)  DEFAULT NULL,
             city        varchar(36) DEFAULT NULL,
             PRIMARY KEY (npa_nxx_key)
           ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
    
  8. Locate the demos/sqoop directory.

    The location of the demo directory depends on your platform:

    • RHEL or Debian installations

      /usr/share/dse-demos/sqoop

    • Tar distribution, such as Mac

      <install_location>/demos/sqoop

  9. Populate the table by loading the CSV file in the demos/sqoop directory.

    mysql> LOAD DATA LOCAL INFILE 'npa_nxx.csv'
             INTO TABLE npa_nxx_demo.npa_nxx
             FIELDS TERMINATED BY ','
             ENCLOSED BY '"'
             LINES TERMINATED BY '\n';
    

    MySQL returns the following message:

    Query OK, 105291 rows affected (1.01 sec) Records: 105291 Deleted: 0 Skipped: 0 Warnings: 0

  10. Start DSE as an analytics node. The method you use depends on your platform:

    • RHEL or Debian installations

      Edit /etc/default/dse, set HADOOP_ENABLED=1, and start the dse service.

    • Tar distribution, such as Mac

      Make the bin directory in the DSE installation directory, the current directory and run the dse cassandra command using the -t option to start dse. The -t option starts Hadoop and marks the node for Analytics.

  11. Use the dse command in the bin directory to migrate the data from the MySQL table to text files in the CFS directory, npa_nxx.

    Use the database username and password or -P instead of --password to be prompted for the database password. If the database account is not password-protected, just omit the password option.

    sudo ./dse sqoop import --connect jdbc:mysql://127.0.0.1/npa_nxx_demo \
      --username root                                                     \
      --password <password>                                               \
      --table npa_nxx                                                     \
      --target-dir /npa_nxx
    

    DSE returns this message: INFO mapreduce.ImportJobBase: Retrieved 105291 records.

To run this demo and import data to nodes in a cluster, the database permissions must be granted to the nodes. For example, use the GRANT ALL command to grant MySQL access to the hosts.