DataStax Enterprise 4.5

Using Mahout

DataStax Enterprise integrates Apache Mahout, a Hadoop component that offers machine learning libraries. Mahout facilitates building intelligent applications that learn from data and user input. Machine learning use cases are many and some, such as the capability of web sites to recommend products to visitors based on previous visits, are notorious.

Currently, Mahout jobs that use Lucene features are not supported.

Running the Mahout demo

The DataStax Enterprise installation includes a Mahout demo. The demo determines with some percentage of certainty which entries in the input data remained statistically in control and which have not. The input data is time series historical data. Using the Mahout algorithms, the demo classifies the data into categories based on whether it exhibited relatively stable behavior over a period of time. The demo produces a file of classified results. This procedure describes how to run the Mahout demo.

Procedure

  1. After installing DataStax Enterprise, start an analytics node.
  2. Go to the demos directory in one of the following locations:
    • Installer-Services and Package installations: /usr/share/dse/demos/mahout
    • Installer-No Services and Tarball installations: install_location/demos/mahout
  3. Run the script in the demos directory. For example, on Linux:
    ./run_mahout_example.sh

    If you are running OpsCenter, you can now view the Hadoop job progress:



    When the demo completes, a message appears on the standard output about the location of the output file. For example:

    The output is in /tmp/clusteranalyze.txt
Show/hide