Using the DataStax ODBC Driver for Hive
Users of DataStax software now have a new way to access and analyze information stored in the Hadoop distribution that’s bundled into DataStax Enterprise (DSE). Using the DataStax ODBC driver for Hive, users can connect to a DataStax Enterprise cluster with their favorite BI tools (e.g. Tableau, Microsoft Excel, etc.) and access data stored on DSE Hadoop nodes.
The DataStax ODBC driver was developed by ODBC software leader Simba Corporation and is compliant with the latest ODBC 3.52 specification, meaning it is completely up to date and state of the art. The ODBC driver runs on both 32 and 64-bit platforms and automatically translates any SQL-92 query into HiveQL.
Let’s take a quick walk through of how it works.
Ensuring the Server is Ready
With DSE, you’re able to have a single cluster with real-time data via Cassandra, analytic data with Hadoop, and enterprise search data with Solr. To use the DataStax ODBC driver for Hive, you want to first make sure you have one or more Hadoop nodes in a cluster running and also have the Hive server started (using this command: (
./dse hive --service hiveserver).
Installing the DataStax ODBC Driver for Hive
The DataStax ODBC driver for Hive is supported for Windows and Linux platforms, and can be downloaded from the DataStax website.
Installing the DataStax ODBC driver for Hive is simple. The only prerequisite needed for Windows is to have the Microsoft Visual C++ 2010 runtime installed for the appropriate platform (either 32 or 64-bit).
The installation of the driver is completed in just a few steps.
You will need a password supplied from DataStax to unlock the driver and install it on your system.
Once the password is entered, the ODBC driver installation from that point is very straightforward with just a few specifications being needed after which the driver is installed:
Once installation is completed, you can review the user’s guide that’s supplied:
Configuring the DataStax Hive ODBC Driver
Configuring the DataStax ODBC driver for Hive is easy. On Windows, you invoke the ODBC Data Source Administrator utility and first validate that the driver is present:
Then, you create either a User or System DSN (data source name) that will be used by front end business intelligence tools:
You’ll need to enter a name for the datasource as well as the IP or host name of the DSE node you want to connect to:
Once the information is entered, you can test your connection:
Using the DataStax ODBC Driver for Hive
Once you’ve successfully configured your ODBC datasource for Hive, you can use it in your preferred BI tools to connect to and pull data back from Hive. For example, to use the DataStax ODBC driver for Hive with Microsoft Excel, you can use the data connection wizard to select your new ODBC datasource:
And then, select one or more data objects (or construct a query) to pull back data:
When the ODBC query from Excel is executed and data is retrieved, a Hive MapReduce job is kicked off on the server:
Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201208230939_0006, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201208230939_0006 Kill Command = ./dse hadoop job -Dmapred.job.tracker=127.0.0.1:8012 -kill job_201208230939_0006 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2012-08-23 12:44:39,795 Stage-1 map = 0%, reduce = 0% 2012-08-23 12:44:42,824 Stage-1 map = 100%, reduce = 0% 2012-08-23 12:44:44,833 Stage-1 map = 100%, reduce = 100% Ended Job = job_201208230939_0006 MapReduce Jobs Launched: Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 SUCESS Total MapReduce CPU Time Spent: 0 msec
Getting data back from Hadoop Hive into your favorite BI tool is easy with the DataStax ODBC Driver. For more information see the online documentation and for downloads of the driver, visit the client drivers downloads page on the DataStax website.