DataStax Blog

Using the DataStax ODBC Driver for Hive

By Robin Schumacher -  December 3, 2012 | 0 Comments

By Robin Schumacher | December 3, 2012

Users of DataStax software now have a new way to access and analyze information stored in the Hadoop distribution that’s bundled into DataStax Enterprise (DSE). Using the DataStax ODBC driver for Hive, users can connect to a DataStax Enterprise cluster with their favorite BI tools (e.g. Tableau, Microsoft Excel, etc.) and access data stored on DSE Hadoop nodes.

The DataStax ODBC driver was developed by ODBC software leader Simba Corporation and is compliant with the latest ODBC 3.52 specification, meaning it is completely up to date and state of the art. The ODBC driver runs on both 32 and 64-bit platforms and automatically translates any SQL-92 query into HiveQL.

Let’s take a quick walk through of how it works.

Ensuring the Server is Ready

With DSE, you’re able to have a single cluster with real-time data via Cassandra, analytic data with Hadoop, and enterprise search data with Solr. To use the DataStax ODBC driver for Hive, you want to first make sure you have one or more Hadoop nodes in a cluster running and also have the Hive server started (using this command: (./dse hive --service hiveserver).

Installing the DataStax ODBC Driver for Hive

The DataStax ODBC driver for Hive is supported for Windows and Linux platforms, and can be downloaded from the DataStax website.

Installing the DataStax ODBC driver for Hive is simple. The only prerequisite needed for Windows is to have the Microsoft Visual C++ 2010 runtime installed for the appropriate platform (either 32 or 64-bit).

The installation of the driver is completed in just a few steps.

DataStax ODBC Driver for Hive

You will need a password supplied from DataStax to unlock the driver and install it on your system.

DataStax ODBC Driver for Hive

Once the password is entered, the ODBC driver installation from that point is very straightforward with just a few specifications being needed after which the driver is installed:

DataStax ODBC Driver for Hive

Once installation is completed, you can review the user’s guide that’s supplied:

DataStax ODBC Driver for Hive

Configuring the DataStax Hive ODBC Driver

Configuring the DataStax ODBC driver for Hive is easy. On Windows, you invoke the ODBC Data Source Administrator utility and first validate that the driver is present:

DataStax ODBC Driver for Hive

Then, you create either a User or System DSN (data source name) that will be used by front end business intelligence tools:

DataStax ODBC Driver for Hive

You’ll need to enter a name for the datasource as well as the IP or host name of the DSE node you want to connect to:

DataStax ODBC Driver for Hive

Once the information is entered, you can test your connection:

DataStax ODBC Driver for Hive

Using the DataStax ODBC Driver for Hive

Once you’ve successfully configured your ODBC datasource for Hive, you can use it in your preferred BI tools to connect to and pull data back from Hive. For example, to use the DataStax ODBC driver for Hive with Microsoft Excel, you can use the data connection wizard to select your new ODBC datasource:

DataStax ODBC Driver for Hive

DataStax ODBC Driver for Hive

And then, select one or more data objects (or construct a query) to pull back data:

DataStax ODBC Driver for Hive

DataStax ODBC Driver for Hive

When the ODBC query from Excel is executed and data is retrieved, a Hive MapReduce job is kicked off on the server:


Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201208230939_0006, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201208230939_0006
Kill Command = ./dse hadoop job  -Dmapred.job.tracker=127.0.0.1:8012 -kill job_201208230939_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2012-08-23 12:44:39,795 Stage-1 map = 0%,  reduce = 0%
2012-08-23 12:44:42,824 Stage-1 map = 100%,  reduce = 0%
2012-08-23 12:44:44,833 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201208230939_0006
MapReduce Jobs Launched: 
Job 0: Map: 1   HDFS Read: 0 HDFS Write: 0 SUCESS
Total MapReduce CPU Time Spent: 0 msec

Conclusions

Getting data back from Hadoop Hive into your favorite BI tool is easy with the DataStax ODBC Driver. For more information see the online documentation and for downloads of the driver, visit the client drivers downloads page on the DataStax website.



Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>