DataStax Enterprise 4.5

Getting started with Shark

You can use Shark just as you use Hive. The following example shows how to run the Portfolio demo using Hadoop. The Portfolio demo sets up tables in Cassandra and inserts data into the tables. You then use Shark to query the tables.

Running the pricer script sets up CQL tables for the demo. In the *install_home*/demos/portfolio_manager directory, run this command on the operating system command line to see the pricer script options.

$ pricer -h

The output includes options for specifying the host node, user name, password, and operation.

-d <node name>, --nodes=<node name>
One or more comma-separated IP addresses of host nodes for the pricer operation. Default is locahost.
-o <operation>, --operation=<operation>
The operation to perform, such as INSERT_PRICES or UPDATE_PORTFOLIOS. The default is INSERT_PRICES
-P <password>, --password=<password>
The Cassandra password if password authentication is configured.
-U <user name>, --username=<user name>
The Cassandra user name if password authentication is configured.

If you run the demo on a multiple node cluster, the pricer might require the host node IP address to connect to Cassandra. The user name and password are required if authentication is enabled.

Procedure

  1. Start DataStax Enterprise in Spark and Hadoop mode.
    On Linux, for example:
    $ bin/dse cassandra -t -k 
  2. Navigate to the demos directory.
    • Installer-Services and Package installations: /usr/share/dse/demos/portfolio_manager
    • Installer-No Services and Tarball installations: install_location/demos/portfolio_manager
  3. On the operating system command line, run the script to create a table of stocks in Cassandra and insert prices into the table.
    $ bin/pricer -o INSERT_PRICES
    
    14/03/14 19:20:12 INFO snitch.Workload: Setting my workload to Cassandra
    Created keyspaces. Sleeping 1s for propagation.
    total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
    10000,1000,1000,0.0320469,9
    Include the IP address of a host node if you are running Spark on a multi-node cluster. Include a user name and password if authentication is configured. For example:
    $ bin/pricer -d 10.11.12.2 -u johndoe -p mypassword -o INSERT_PRICES
  4. Run two more scripts to populate Cassandra tables for the demo
    $ bin/pricer -o UPDATE_PORTFOLIOS 
    
    14/03/14 19:21:52 INFO snitch.Workload: Setting my workload to Cassandra
    Keyspace names must be case-insensitively unique ("PortfolioDemo" conflicts with "PortfolioDemo")
    total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
    407,40,40,0.23466093366093366,11
    1844,143,143,0.08896659707724426,22
    3368,152,152,0.08219750656167979,33
    5139,177,177,0.07158215697346132,44
    6950,181,181,0.06904583103257869,55
    8918,196,196,0.062348577235772355,65
    10000,108,108,0.06035674676524954,72
    
    $ bin/pricer -o INSERT_HISTORICAL_PRICES -n 100
    
    14/03/14 19:26:42 INFO snitch.Workload: Setting my workload to Cassandra
    Keyspace names must be case-insensitively unique ("PortfolioDemo" conflicts with "PortfolioDemo")
    total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
    53,5,15094,3.278,13
    100,4,13385,3.0340212765957446,15
  5. Start Shark.
    $ dse shark
    Starting the Shark Command Line Client
    . . .
    2014-03-14 20:37:09.315:INFO:oejs.AbstractConnector:Started SelectChannelConnector@0.0.0.0:4040
    Reloading cached RDDs from previous Shark sessions... (use -skipRddReload flag to skip reloading)
  6. Enter these queries to analyze the portfolio data.
    shark> USE PortfolioDemo;
    OK
    Time taken: 0.384 seconds
    
    shark> DESCRIBE StockHist;
    Output is:
    OK
    key                   string                from deserializer   
    column1               string                from deserializer   
    value                 double                from deserializer   
    Time taken: 0.208 seconds
  7. Continue querying the data by selecting the count from the Stocks table and then select ten stocks, ordered by value
    shark> SELECT count(*) FROM Stocks;
    OK
    2759
    Time taken: 9.899 seconds
    
    shark> SELECT * FROM Stocks ORDER BY value DESC LIMIT 10;
    OK
    XIN price 99.95643836954761
    JQC price 99.92873883263657
    SBH price 99.87928626341066
    CCJ price 99.83980527070464
    QXM price 99.72161816290533
    DPC price 99.70004934561737
    AVT price 99.69106570398871
    ANW price 99.69009660302422
    PMO price 99.67491825839043
    WMT price 99.67281873305834
    Time taken: 2.204 seconds
  8. Use the Explain command in Shark to get specific Hive and Shark information.
    shark> EXPLAIN SELECT * FROM Stocks ORDER BY value DESC LIMIT 10;
    After listing some Hive information in the abstract syntax tree, you see the Shark query plan. At this point,Spark Worker page lists the Shark application that you are running.
  9. Exit Shark.
    shark> exit;
Show/hide