DataStax Enterprise 3.0 Documentation

Using the DataStax ODBC driver for Hive on Windows

This documentation corresponds to an earlier product version. Make sure this document corresponds to your version.

Latest DSE documentation | Earlier DSE documentation

The DataStax ODBC Driver for Hive provides Windows users access to the information stored in the Hadoop distribution bundled into DataStax Enterprise. This driver allows you to access the data stored on your DataStax Enterprise Hadoop nodes using business intelligence (BI) tools, such as Tableau and Microsoft Excel. The driver is compliant with the latest ODBC 3.52 specification and automatically translates any SQL-92 query into HiveQL.

Prerequisites

  • Windows® 7 Professional or Windows® 2008 R2. Both 32- and 64-bit editions are supported.
  • Microsoft Visual C++ 2010 runtime.
  • A cluster with a Hadoop node running the Hive server. See Starting the Hive server.

Installing the driver

To install the DataStax ODBC driver on a Windows platform:

  1. Download the driver from Client Libraries and CQL Drivers.
  2. Double-click the downloaded file and follow the wizard's instructions.

Configuring the driver

Set up the DataStax ODBC driver for access by your BI tool.

  1. Click Start > Program Files > DataStax Hive ODBC Connector > ODBC Driver Manager.

  2. Click the Drivers tab to verify that the driver is present.


    ../../_images/verify_odbc.png
  3. Create either a User or System DSN (data source name) for your BI tool connection.

    1. Click the User DSN or System DSN tab.

    2. Click Add, select DataStax Hive ODBC Connector, and then click Finish.

    3. In DataStax Hive ODBC Connector Setup, enter the following:

      Data Source Name The name for your DSN.
      Description Optional.
      Host IP or hostname of your Hive server.
      Port Listening port for the Hive service.
      Database By default, all tables reside within the default database. To check for the appropriate database, use the show databases Hive command.
    4. Click Test.

      The test results are displayed.

      Note

      If your DataStax Enterprise cluster is on Amazon EC2, you must open the listing port for the Hive Server. For more information, refer to the latest AMI documentation.

  4. To configure the advanced options, see Appendix C in the DataStax Hive ODBC Connector User Guide for Windows:

    Start > Program Files > DataStax Hive ODBC Connector > User's Guide

Using the DataStax ODBC driver for Hive

After configuring the ODBC data source for Hive, you can connect and pull data from Hive using any compliant BI tool. For example, to retrieve data using Microsoft Excel:

  1. Use the data connection wizard to select your new ODBC data source:


    ../../_images/excel_connect.png
  2. In Connect to OBDC Data Source, select DSE2 Hive, and then click Next.

  3. Select one or more data objects (or construct a query) to retrieve the data, and then click Finish.


    ../../_images/result.png

    After the ODBC query is executed and the data is retrieved, a Hive MapReduce job runs on the server:

    Total MapReduce jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_201208230939_0006,
       Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201208230939_0006
    Kill Command = ./dse hadoop job
       -Dmapred.job.tracker=127.0.0.1:8012 -kill job_201208230939_0006
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
    2012-08-23 12:44:39,795 Stage-1 map = 0%,  reduce = 0%
    2012-08-23 12:44:42,824 Stage-1 map = 100%,  reduce = 0%
    2012-08-23 12:44:44,833 Stage-1 map = 100%,  reduce = 100%
    Ended Job = job_201208230939_0006
    MapReduce Jobs Launched:
    Job 0: Map: 1   HDFS Read: 0 HDFS Write: 0 SUCCESS
    Total MapReduce CPU Time Spent: 0 msec