Getting started with Shark
You can use Shark just as you use Hive. The following example assumes that you ran the Portfolio Manager demo using Hadoop to generate the data for the example. For more examples, refer to Hive documentation. The backend implementation of Hive and Shark differ, but the user interface and query language are interchangeable for the most part.
- Start DataStax Enterprise in Spark mode.
$ dse shark Starting the Shark Command Line Client . . . 2014-03-14 20:37:09.315:INFO:oejs.AbstractConnector:Started SelectChannelConnector@0.0.0.0:4040 Reloading cached RDDs from previous Shark sessions... (use -skipRddReload flag to skip reloading)
- Enter these queries to analyze the portfolio
shark> USE PortfolioDemo; OK Time taken: 0.384 seconds shark> DESCRIBE StockHist;Output is:
OK key string from deserializer column1 string from deserializer value double from deserializer Time taken: 0.208 seconds
- Continue querying the data by selecting the count from the Stocks table and then select ten
stocks, ordered by
shark> SELECT count(*) FROM Stocks; OK 2759 Time taken: 9.899 seconds shark> SELECT * FROM Stocks ORDER BY value DESC LIMIT 10; OK XIN price 99.95643836954761 JQC price 99.92873883263657 SBH price 99.87928626341066 CCJ price 99.83980527070464 QXM price 99.72161816290533 DPC price 99.70004934561737 AVT price 99.69106570398871 ANW price 99.69009660302422 PMO price 99.67491825839043 WMT price 99.67281873305834 Time taken: 2.204 seconds
- Use the Explain command in Shark to get specific Hive and Shark
shark> EXPLAIN SELECT * FROM Stocks ORDER BY value DESC LIMIT 10;
After listing some Hive information in the abstract syntax tree, you see the Shark query plan. At this point,Spark Worker page lists the Shark application that you are running.
- Exit Shark.