Built into DataStax Enterprise is an enhanced Hadoop distribution that is fully compatible with existing HDFS, Hadoop, and Hive tools and utilities. This topic contains information on:
Use the following command to start Hadoop:
dse hadoop fs <args>
where the available <args> are described the HDFS File System Shell Guide on the Apache Hadoop web site.
For information on starting Hive, Pig, or using Hadoop, see:
Your DataStax Enterprise (DSE) installation contains a Portfolio Manager sample application that shows a sample mixed workload on a DSE cluster. The demo is located in /usr/share/dse-demos/portfolio_manager for packaged installations or <install_location>/demos/portfolio_manager for binary installations.
The portfolio manager application demonstrates a hybrid workflow using DataStax Enterprise. The use case is a financial application where users can actively create and manage a portfolio of stocks.
On the Cassandra OLTP (online transaction processing) side, each portfolio contains a list of stocks, the number of shares purchased, and the price at which the shares were purchased. A live stream of data simulates an active stock market, and updates each portfolio based on its overall value and the percentage of gain or loss compared to the purchase price. Historical market data is tracked for each stock (the end-of-day price) going back in time.
In the demo, simulated real-time stock data is generated by the pricer utility. This utility generates portfolios, live stock prices, and historical market data.
On the DSE OLAP side, a Hive MapReduce job calculates the greatest historical 10 day loss period for each portfolio, which is an indicator of the risk associated with a portfolio. This information is then fed back into the real-time application to allow customers to better gage their potential losses.
Before you begin, make sure you have installed, configured, and started DSE on either a single node (as an Analytics node) or a cluster. If running the demo on a cluster, install and run the demo from the DSE Job Tracker (analytics seed) node.
Go to the portfolio manager demo directory.
You must run the pricer utility from a directory where you have write permissions (such as your home directory), or else run it as root or using sudo.
Run the bin/pricer utility to generate 100 days worth of historical stock data:
To see all of the available options for this utility:
If running on a single node cluster on localhost (as described in Quick Start with DataStax Enterprise):
bin/pricer -o INSERT_PRICES bin/pricer -o UPDATE_PORTFOLIOS bin/pricer -o INSERT_HISTORICAL_PRICES -n 100
If running the demo on a cluster:
LOCAL_IP=<demo_node_ip_address> bin/pricer -o INSERT_PRICES -d $LOCAL_IP \ --replication-strategy="org.apache.cassandra.locator.NetworkTopologyStrategy" \ --strategy-properties="Analytics:1,Cassandra:1" bin/pricer -d $LOCAL_IP -o UPDATE_PORTFOLIOS bin/pricer -d $LOCAL_IP -o INSERT_HISTORICAL_PRICES -n 100
NetworkTopologyStrategy is the preferred replication placement strategy. For more information, see NetworkTopologyStrategy.
Start the web service.
cd website java -jar start.jar &
Open a browser and go to http://localhost:8983/portfolio to see the real-time Portfolio Manager demo application.
Open another shell window.
Start Hive and run the MapReduce job for the demo in Hive.
dse hive -f /usr/share/dse-demos/portfolio_manager/10_day_loss.q
or for binary installations:
<install_location>/bin/dse hive -f <install_location>/demos/portfolio_manager/10_day_loss.q
The MapReduce job will take several minutes to run. Open the URL http://localhost:50030/jobtracker.jsp in a browser to watch the progress in the Job Tracker.
After the job completes, refresh the Portfolio Manager web page to see the results of the Largest Historical 10 day Loss for each portfolio.
How to enable Hadoop to connect to external addresses:
In the core-site.xml file, change the property fs.default.name from file:/// to cfs:<listen_address>:<rpc_port>.
This eliminates the need to specify the IP address or hostname for MapReduce jobs and all other calls to Hadoop. The core-site.xml file is located in the following locations:
Packaged installations: /etc/dse/hadoop
Binary installations: /<install_location>/resources/hadoop/conf
Or run the following embedded parameter:
dse hadoop fs -Dfs.default.name="cfs:<listen_address>:<rpc_port>" -ls /