All,
This is something I've tried, and it seems to work, but I don't know if it's due to coincidence. I have a map-reduce job that we have been running for ages on a regular hadoop cluster. For some reason, it appears to be manually checking mapred-site.xml for the jobtracker address, and when I leave it set to the default, the job fails with:
java.lang.RuntimeException: Not a host:port pair: ${brisk.job.tracker}
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:138)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:125)
at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:2549)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:454)
at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:437)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:477)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:475)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:464)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:494)
....
However, if I explicitly set the job tracker address in mapred-site.xml everything seems to work fine. But I'm not sure if the cluster will ever elect a job tracker that isn't the one I specify, thereby screwing things up. I realize that by explicitly setting it, I lose a great deal of flexibility, but will it actively break anything?
Thanks,
Matt
