DataStax Enterprise 3.1 Documentation

Using the job tracker node

This documentation corresponds to an earlier product version. Make sure this document corresponds to your version.

Latest DSE documentation | Earlier DSE documentation

For each MapReduce job submitted to the job tracker, DataStax Enterprise schedules a series of tasks on the analytics nodes. One task tracker service per node handles the map and reduce tasks scheduled for that node. Within a data center, the job tracker monitors the execution and status of distributed tasks that comprise a MapReduce job.

Using multiple job tracker services

DataStax Enterprise 2.1 and later can use multiple job tracker nodes in a cluster, one per data center. In deployments having multiple data centers far away from each other, using multiple job trackers and multiple file systems can improve performance by taking advantage of data locality on each cluster.

Tasks related to the job tracker are:

Setting the job tracker node

There are several ways to set the job tracker node:

  • Configure the Cassandra seeds list in cassandra.yaml. DataStax Enterprise designates the first analytics node from the seeds list as the job tracker node.

  • Start up an analytics node using the -j option.

    dse cassandra -t -j

    or in a binary distribution:

    <install_location>/bin/dse cassandra -t -j
  • Use the dsetool movejt command.

If you list any IP addresses in the seeds list of the cassandra.yaml file, DataStax Enterprise nominates a node from the list in each data center to be the job tracker.

About the reserve job tracker

DataStax Enterprise nominates a node in the cluster as a reserve job tracker for a data center. The reserve job tracker becomes the job tracker when, for some reason, there is no local node in the data center that can function as job tracker.

Using common hadoop commands

Use familiar hadoop fs commands to perform functions in the CassandraFS that correspond to open source, HDFS file system shell commands in HDFS:

  • Packaged or AMI distributions:

    dse hadoop fs <option>
  • Tarball installation:

    <install_location>/bin/dse hadoop fs <option>

For example, using this syntax, you can load MapReduce input from the local file system into the Cassandra File System on Linux.

dse hadoop fs -mkdir /user/hadoop/wordcount/input

dse hadoop fs -copyFromLocal $HADOOP_EXAMPLE/data/state_of_union/state_of_union.txt

To list all options for performing command hadoop HDFS commands:

dse hadoop fs -help

A DSE command reference lists other commands.

Managing the job tracker using dsetool commands

Several dsetool commands are useful for managing job tracker nodes:

  • dsetool jobtracker

    Returns the job tracker hostname and port to your location in the data center where you issued the command.

  • dsetool movejt <data center>-<workload> <node IP>

    Moves the job tracker and notifies the task tracker nodes.

  • dsetool movejt <node IP>

    In DataStax Enterprise 2.1 and later, if you do not specify the data center name, the command moves the reserve job tracker.

  • dsetool listjt

    Lists all job tracker nodes grouped by their local data center.

  • dsetool ring

    Lists the nodes and types of the nodes in the ring.

Listing job trackers example

If you are not sure which nodes in your DSE cluster are job tracker, run the following command:

dsetool jobtracker

or in a binary distribution:

<install_location>/bin/dsetool jobtracker

Moving the job tracker node example

If your primary job tracker node fails, you can use dsetool movejt to move the job tracker to another analytics node in the cluster.

  1. Log in to a DataStax Enterprise analytics node.

  2. Run the dsetool movejt command and specify the data center name, hyphen, Analytics (for the workload), and the IP address of the new job tracker node in your DataStax Enterprise cluster. For example, to move the job tracker to node in the DC1 data center:

    dsetool movejt DC1-Analytics

    or in a binary distribution:

    <install_location>/bin/dsetool movejt DC1-Analytics
  3. Allow 20 seconds for all of the analytics nodes to detect the change and restart their task tracker processes.

  4. In a browser, connect to the new job tracker and confirm that it is up and running. For example (change the IP to reflect your job tracker node IP):
  5. If you are running Hive or Pig MapReduce clients, you must restart them to pick up the new job tracker node information.

Changing the job tracker client port

By default, the job tracker listens on port 8012 for client messages. You can use another port by configuring the mapred.job.tracker property.

To change the job tracker client port:

  1. Open the mapred-site.xml file for editing. The location of this file is:

    • Packaged installations: /etc/dse/hadoop
    • Binary installations: /<install_location>/resources/hadoop/conf
  2. Locate the mapred.job.tracker property.

    <!-- Auto detect the dse job tracker -->
        The address of the job tracker
  1. In the mapred.job.tracker property, change the placeholder ${dse.job.tracker} value to the port number you want to use. For example, change the port number from the default to 8013.

    <!-- Auto detect the dse job tracker -->
        The address of the job tracker