DataStax 2.1 and later support Apache Mahout, a Hadoop component that offers machine learning libraries. Mahout facilitates building intelligent applications that learn from data and user input. Machine learning use cases are many and some, such as the capability of web sites to recommend products to visitors based on previous visits, are notorious.
Currently, Mahout jobs that use Lucene features are not supported.
The DataStax Enterprise installation includes a Mahout demo. The demo determines with some percentage of certainty which entries in the input data remained statistically in control and which have not. The input data is time series historical data. Using the Mahout algorithms, the demo classifies the data into categories based on whether it exhibited relatively stable behavior over a period of time. The demo produces a file of classified results.
To run the Mahout demo
After installing DataStax Enterprise, start an analytics node.
Go to the demos directory in one of these locations:
Run the script in the demo directory. For example, on Linux:
If you are running OpsCenter, view the Hadoop job progress:
When the demo completes, a message appears on the standard output about the location of the output file. For example:
The output is in /tmp/clusteranalyze.txt
You can run Mahout commands on the dse command line. For example, on Mac OSX to get a list of which commands are available:
cd ~/dse-3.0 bin/dse mahout
The list of commands appears.
Mahout command line help
You use one of these commands as the first argument plus the help option.
cd ~/dse-3.0 bin/dse mahout arff.vector --help
The output is help on the arff.vector command.
Add Mahout classes to the class path, execute Hadoop command
You can use Hadoop commands to work with Mahout. Using this syntax first adds Mahout classes to the class path, and then executes the Hadoop command.
dse mahout hadoop <hadoop command> <options>
For example, a Mahout file as input to this command, converts the file to text, so you can read it:
cd ~/dse-3.0 bin/dse mahout hadoop fs -text <mahout file> | more
The Apache web site offers an in-depth tutorial.