The DataStax ODBC Driver for Hive provides Windows users access to the information stored in the Hadoop distribution bundled into DataStax Enterprise. This driver allows you to access the data stored on your DataStax Enterprise Hadoop nodes using business intelligence (BI) tools, such as Tableau and Microsoft Excel. The driver is compliant with the latest ODBC 3.52 specification and automatically translates any SQL-92 query into HiveQL.
To install the DataStax ODBC driver on a Windows platform:
Set up the DataStax ODBC driver for access by your BI tool.
Click Start > Program Files > DataStax Hive ODBC Connector > ODBC Driver Manager.
Click the Drivers tab to verify that the driver is present.
Create either a User or System DSN (data source name) for your BI tool connection.
Click the User DSN or System DSN tab.
Click Add, select DataStax Hive ODBC Connector, and then click Finish.
In DataStax Hive ODBC Connector Setup, enter the following:
|Data Source Name||The name for your DSN.|
|Host||IP or hostname of your Hive server.|
|Port||Listening port for the Hive service.|
|Database||By default, all tables reside within the default database. To check for the appropriate database, use the show databases Hive command.|
The test results are displayed.
If your DataStax Enterprise cluster is on Amazon EC2, you must open the listing port for the Hive Server. For more information, refer to latest AMI documentation.
To configure the advanced options, see Appendix C in the DataStax Hive ODBC Connector User Guide for Windows:
Start > Program Files > DataStax Hive ODBC Connector > User's Guide
After configuring the ODBC data source for Hive, you can connect and pull data from Hive using any compliant BI tool. For example, to retrieve data using Microsoft Excel:
Use the data connection wizard to select your new ODBC data source:
In Connect to OBDC Data Source, select DSE2 Hive, and then click Next.
Select one or more data objects (or construct a query) to retrieve the data, and then click Finish.
After the ODBC query is executed and the data is retrieved, a Hive MapReduce job runs on the server:
Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201208230939_0006, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201208230939_0006 Kill Command = ./dse hadoop job -Dmapred.job.tracker=127.0.0.1:8012 -kill job_201208230939_0006 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2012-08-23 12:44:39,795 Stage-1 map = 0%, reduce = 0% 2012-08-23 12:44:42,824 Stage-1 map = 100%, reduce = 0% 2012-08-23 12:44:44,833 Stage-1 map = 100%, reduce = 100% Ended Job = job_201208230939_0006 MapReduce Jobs Launched: Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec