DataStax Enterprise 2.1 Documentation

Using the Job Tracker Node

This document corresponds to an earlier product version. Make sure you are using the version that corresponds to your version.

Latest DSE documentation | Earlier DSE documentation

For each MapReduce job submitted to the job tracker, DataStax Enterprise schedules a series of tasks on the analytics nodes. One task tracker service per node handles the map and reduce tasks scheduled for that node. Within a data center, the job tracker monitors the execution and status of distributed tasks that comprise a MapReduce job.

Using Multiple Job Tracker Services

DataStax Enterprise 2.1 can use multiple job tracker nodes in a cluster, one per data center. In deployments having multiple data centers far away from each other, using multiple job trackers and multiple file systems can improve performance by taking advantage of data locality on each cluster.

Tasks related to the job tracker are:

Setting the Job Tracker Node

There are several ways to set the job tracker node:

  • Configure the Cassandra seeds list in cassandra.yaml. DataStax Enterprise designates the first analytics node from the seeds list as the job tracker node.

  • Start up an analytics node using the -j option.

    dse cassandra -t -j
    

    or in a binary distribution:

    <install_location>/bin/dse cassandra -t -j
    
  • Use the dsetool movejt command.

If you list any IP addresses in the seeds list of the cassandra.yaml file, DataStax Enterprise nominates a node from the list in each data center to be the job tracker.

About the Reserve Job Tracker

DataStax Enterprise 2.1 and later nominates a node in the cluster as a reserve job tracker for a data center. The reserve job tracker becomes the job tracker when, for some reason, there is no local node in the data center that can function as job tracker.

When you upgrade from DataStax Enterprise 2.0 and earlier to DataStax Enterprise 2.1, the job tracker node from the old release is automatically designated as the temporary, reserve job tracker. After migration, the local job tracker election process runs in each data center to determine permanent, reserve job trackers.

Managing the Job Tracker Using dsetool Commands

Several dsetool commands are useful for managing job tracker nodes:

  • dsetool jobtracker

    Returns the job tracker hostname and port to your location in the data center where you issued the command.

  • dsetool movejt <data center>-<workload> <node IP>

    Moves the job tracker and notifies the task tracker nodes.

  • dsetool movejt <node IP>

In DataStax Enterprise 2.1 and higher, if you do not specify the data center name, the command moves the reserve job tracker.
  • dsetool listjt

    Lists all job tracker nodes grouped by their local data center.

  • dsetool ring

    Lists the nodes and types of the nodes in the ring.

Listing Job Trackers Example

If you are not sure which nodes in your DSE cluster are job tracker, run the following command:

dsetool jobtracker

or in a binary distribution:

<install_location>/bin/dsetool jobtracker

Moving the Job Tracker Node Example

If your primary job tracker node fails, you can use dsetool movejt to moves the job tracker to another analytics node in the cluster.

  1. Log in to a DataStax Enterprise analytics node.

  2. Run the dsetool movejt command and specify the data center name, hyphen, Analytics (for the workload), and the IP address of the new job tracker node in your DataStax Enterprise cluster. For example, to move the job tracker to node 110.82.155.4 in the DC1 data center:

    dsetool movejt DC1-Analytics 110.82.155.4
    

    or in a binary distribution:

    <install_location>/bin/dsetool movejt DC1-Analytics 110.82.155.4
    
  3. Allow 20 seconds for all of the analytics nodes to detect the change and restart their task tracker processes.

  4. In a browser, connect to the new job tracker and confirm that it is up and running. For example (change the IP to reflect your job tracker node IP):

    http://110.82.155.4:50030
    
  5. If you are running Hive or Pig MapReduce clients, you must restart them to pick up the new job tracker node information.

Changing the Job Tracker Client Port

By default, the job tracker listens on port 8012 for client messages. You can use another port by configuring the mapred.job.tracker property.

To change the job tracker client port:

  1. Open the mapred-site.xml file for editing. The location of this file is:

    • Packaged installations: /etc/dse/hadoop
    • Binary installations: /<install_location>/resources/hadoop/conf
  2. Locate the mapred.job.tracker property.

    <!-- Auto detect the dse job tracker -->
    <property>
      <name>mapred.job.tracker</name>
      <value>${dse.job.tracker}</value>
      <description>
        The address of the job tracker
      </description>
    
  1. In the mapred.job.tracker property, change the placeholder ${dse.job.tracker} value to the port number you want to use. For example, change the port number from the default to 8013.

    <!-- Auto detect the dse job tracker -->
    <property>
      <name>mapred.job.tracker</name>
      <value>8013</value>
      <description>
        The address of the job tracker
      </description>