<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>DataStax Support Forums &#187; Topic: Submitting a Map-Reduce Job using CassandraJobConf</title>
		<link>http://www.datastax.com/support-forums/topic/submitting-a-map-reduce-job-using-cassandrajobconf</link>
		<description>Software, Support, and Training for Apache Cassandra</description>
		<language>en-US</language>
		<pubDate>Sat, 18 May 2013 23:59:52 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.3</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://www.datastax.com/support-forums/search.php</link>
		</textInput>
		<atom:link href="http://www.datastax.com/support-forums/rss/topic/submitting-a-map-reduce-job-using-cassandrajobconf" rel="self" type="application/rss+xml" />

		<item>
			<title>deltafoxtrot on "Submitting a Map-Reduce Job using CassandraJobConf"</title>
			<link>http://www.datastax.com/support-forums/topic/submitting-a-map-reduce-job-using-cassandrajobconf#post-272</link>
			<pubDate>Wed, 06 Jul 2011 14:55:01 +0000</pubDate>
			<dc:creator>deltafoxtrot</dc:creator>
			<guid isPermaLink="false">272@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Thanks for your feedback.  Let me take a step back for a second and explain what we had in place prior to our Brisk integration effort.  Our product is comprised of three web apps (each of which is the product of multiple projects - all Maven-based).  Two of the three web apps handle the OLTP side of our product and one of the web apps handles the OLAP side.  About a month ago, we replaced one of our OLAP tasks with a Hadoop implementation and saw significant performance improvement.  This Hadoop task lived within the web app and was executed via the ToolRunner as the result of a restful request to the web app.  All of this worked albeit with limitations.  So given the promise of Brisk, we're quite keen to integrate.  Unfortunately we did not find any examples on your site that relate to the way we submitted jobs previously.  When you write our service &#34;seems very non-standard&#34;, how do you mean specifically?  What is the &#34;standard&#34; way to submit jobs?  Command line?&#60;/p&#62;
&#60;p&#62;I've aligned the versions of Hadoop to use 0.20.203-brisk1. As we're a Maven shop, we pushed the ivy/hadoop-core.pom.xml that we found in your beta 1 distribution when creating an artifact for the brisk version of Hadoop.  Not sure if this is an accurate pom for the brisk hadoop core code.  We also added the core-site.xml to our web app (the same one that came down with the Brisk beta 1 binary).  It initially failed (ClassNotFound for SnappyException) though we added that dependency to overcome.  Then it complained that it couldn't find the cassandra.yaml.  Needless to say, it seems this is not the recommended approach.  :o)
&#60;/p&#62;</description>
		</item>
		<item>
			<title>tjake on "Submitting a Map-Reduce Job using CassandraJobConf"</title>
			<link>http://www.datastax.com/support-forums/topic/submitting-a-map-reduce-job-using-cassandrajobconf#post-271</link>
			<pubDate>Wed, 06 Jul 2011 12:21:51 +0000</pubDate>
			<dc:creator>tjake</dc:creator>
			<guid isPermaLink="false">271@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;I the version of hadoop you are using on your service the same as brisk (you should be using the brisk jars)  Also, you need to use the core-site.xml in your config.  I don't see cfs:// anywhere in your config log statements.  &#60;/p&#62;
&#60;p&#62;I don't quite understand what you are doing in terms of your service but it seems very non-standard :)
&#60;/p&#62;</description>
		</item>
		<item>
			<title>deltafoxtrot on "Submitting a Map-Reduce Job using CassandraJobConf"</title>
			<link>http://www.datastax.com/support-forums/topic/submitting-a-map-reduce-job-using-cassandrajobconf#post-270</link>
			<pubDate>Wed, 06 Jul 2011 10:53:38 +0000</pubDate>
			<dc:creator>deltafoxtrot</dc:creator>
			<guid isPermaLink="false">270@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;The current run method with a temporarily hard-coded JobTracker address is posted below.  It fails downstream from the Job instantiation.&#60;/p&#62;
&#60;p&#62;	@Override&#60;br /&#62;
	public int run(String[] args) throws Exception {&#60;br /&#62;
		try {&#60;br /&#62;
			Configuration conf = new Configuration();&#60;br /&#62;
			conf.set(&#34;mapred.job.tracker&#34;, &#34;ondroid-si-1:8012&#34;);&#60;/p&#62;
&#60;p&#62;			new GenericOptionsParser(conf, args).getRemainingArgs();&#60;/p&#62;
&#60;p&#62;			ConfigHelper.setRpcPort(conf, Integer.valueOf(port).toString());&#60;br /&#62;
			ConfigHelper.setInitialAddress(conf, host);&#60;br /&#62;
			ConfigHelper.setPartitioner(conf, RandomPartitioner.class.getCanonicalName());&#60;br /&#62;
			ConfigHelper.setInputColumnFamily(conf, IN_KEYSPACE, IN_COLUMN_FAMILY);&#60;br /&#62;
			ConfigHelper.setOutputColumnFamily(conf, OUT_KEYSPACE, OUT_COLUMN_FAMILY);&#60;/p&#62;
&#60;p&#62;			SlicePredicate predicate = new SlicePredicate();&#60;br /&#62;
			predicate.setSlice_range(new SliceRange(bytes(null), bytes(null), false, Integer.MAX_VALUE));&#60;/p&#62;
&#60;p&#62;			ConfigHelper.setInputSlicePredicate(conf, predicate);&#60;/p&#62;
&#60;p&#62;			Job job = new Job(conf, &#34;dependency.graph&#34;);&#60;br /&#62;
			job.setJarByClass(DependencyGraph.class);&#60;br /&#62;
			job.setMapperClass(DependencyGraphMapper.class);&#60;br /&#62;
			job.setReducerClass(DependencyGraphReducer.class);&#60;/p&#62;
&#60;p&#62;			job.setMapOutputKeyClass(Text.class);&#60;br /&#62;
			job.setMapOutputValueClass(Text.class);&#60;br /&#62;
			job.setOutputKeyClass(ByteBuffer.class);&#60;br /&#62;
			job.setOutputValueClass(List.class);&#60;/p&#62;
&#60;p&#62;			job.setOutputFormatClass(ColumnFamilyOutputFormat.class);&#60;br /&#62;
			job.setInputFormatClass(ColumnFamilyInputFormat.class);&#60;/p&#62;
&#60;p&#62;			if (log.isInfoEnabled()) {&#60;br /&#62;
				log.info(&#34;Submitting DependencyGraph job...&#34;);&#60;br /&#62;
			}&#60;br /&#62;
			job.waitForCompletion(true);&#60;br /&#62;
			return 0;&#60;/p&#62;
&#60;p&#62;		} catch (Exception e) {&#60;br /&#62;
			log.error(&#34;Caught exception waiting for job to complete:\n&#34; + ExceptionUtils.getStackTrace(e));&#60;br /&#62;
			throw e;&#60;br /&#62;
		}&#60;/p&#62;
&#60;p&#62;	}
&#60;/p&#62;</description>
		</item>
		<item>
			<title>deltafoxtrot on "Submitting a Map-Reduce Job using CassandraJobConf"</title>
			<link>http://www.datastax.com/support-forums/topic/submitting-a-map-reduce-job-using-cassandrajobconf#post-269</link>
			<pubDate>Wed, 06 Jul 2011 10:37:02 +0000</pubDate>
			<dc:creator>deltafoxtrot</dc:creator>
			<guid isPermaLink="false">269@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Ok.  We've set the JobTracker address based on the response from brisktool.  When we execute now it fails with an EOFException (see detailed logging and stacktrace below).  Any ideas?&#60;/p&#62;
&#60;p&#62;From our application log:&#60;/p&#62;
&#60;p&#62;06-07-2011 11:31:40 [http-8080-1]  INFO controller.DataController - Computing targets graphs started at: Wed Jul 06 11:31:40 BST 2011&#60;br /&#62;
06-07-2011 11:31:40 [http-8080-1]  INFO cassandra.AbstractStore - Truncating column family 'data_graphs'&#60;br /&#62;
06-07-2011 11:31:40 [http-8080-1]  INFO client.CassandraClient - CassandraProxyClient(localhost, 9170, true, ROUND_ROBIN)&#60;br /&#62;
06-07-2011 11:31:40 [http-8080-1]  INFO client.CassandraClient - Connected to cassandra at localhost:9170&#60;br /&#62;
06-07-2011 11:31:40 [http-8080-1]  INFO hadoop.DependencyGraph - Configuring new DependencyGraph job against host localhost and port 9170&#60;br /&#62;
06-07-2011 11:31:40 [http-8080-1]  INFO hadoop.DependencyGraph - Configuration Properties:&#60;br /&#62;
io.seqfile.compress.blocksize=1000000, fs.checkpoint.size=67108864, io.skip.checksum.errors=false, mapred.used.genericoptionsparser=true, fs.s3n.impl=org.apache.hadoop.fs.s3native.NativeS3FileSystem, fs.s3.maxRetries=4, webinterface.private.actions=false, fs.s3.impl=org.apache.hadoop.fs.s3.S3FileSystem, hadoop.native.lib=true, fs.checkpoint.edits.dir=${fs.checkpoint.dir}, ipc.server.listen.queue.size=128, fs.default.name=file:///, ipc.client.idlethreshold=4000, fs.hsftp.impl=org.apache.hadoop.hdfs.HsftpFileSystem, hadoop.tmp.dir=/tmp/hadoop-${user.name}, fs.checkpoint.dir=${hadoop.tmp.dir}/dfs/namesecondary, fs.s3.block.size=67108864, hadoop.security.authorization=false, io.serializations=org.apache.hadoop.io.serializer.WritableSerialization, hadoop.util.hash.type=murmur, io.seqfile.lazydecompress=true, io.file.buffer.size=4096, io.mapfile.bloom.size=1048576, fs.s3.buffer.dir=${hadoop.tmp.dir}/s3, hadoop.logfile.size=10000000, ipc.client.kill.max=10, io.compression.codecs=org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec, topology.script.number.args=100, fs.har.impl=org.apache.hadoop.fs.HarFileSystem, io.seqfile.sorter.recordlimit=1000000, fs.trash.interval=0, local.cache.size=10737418240, ipc.server.tcpnodelay=false, ipc.client.connect.max.retries=10, fs.ramfs.impl=org.apache.hadoop.fs.InMemoryFileSystem, hadoop.rpc.socket.factory.class.default=org.apache.hadoop.net.StandardSocketFactory, fs.kfs.impl=org.apache.hadoop.fs.kfs.KosmosFileSystem, fs.checkpoint.period=3600, topology.node.switch.mapping.impl=org.apache.hadoop.net.ScriptBasedMapping, hadoop.logfile.count=10, fs.ftp.impl=org.apache.hadoop.fs.ftp.FTPFileSystem, fs.file.impl=org.apache.hadoop.fs.LocalFileSystem, fs.hdfs.impl=org.apache.hadoop.hdfs.DistributedFileSystem, ipc.client.connection.maxidletime=10000, io.mapfile.bloom.error.rate=0.005, io.bytes.per.checksum=512, mapred.job.tracker=ondroid-si-1:8012, fs.har.impl.disable.cache=true, ipc.client.tcpnodelay=false, fs.hftp.impl=org.apache.hadoop.hdfs.HftpFileSystem, fs.s3.sleepTimeSeconds=10,&#60;br /&#62;
06-07-2011 11:31:41 [http-8080-1] ERROR hadoop.DependencyGraph - Caught exception waiting for job to complete:&#60;br /&#62;
java.io.IOException: Call to ondroid-si-1/10.20.5.191:8012 failed on local exception: java.io.EOFException&#60;br /&#62;
	at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)&#60;br /&#62;
	at org.apache.hadoop.ipc.Client.call(Client.java:743)&#60;br /&#62;
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)&#60;br /&#62;
	at org.apache.hadoop.mapred.$Proxy105.getProtocolVersion(Unknown Source)&#60;br /&#62;
	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)&#60;br /&#62;
	at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)&#60;br /&#62;
	at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)&#60;br /&#62;
	at org.apache.hadoop.mapred.JobClient.&#38;lt;init&#38;gt;(JobClient.java:410)&#60;br /&#62;
	at org.apache.hadoop.mapreduce.Job.&#38;lt;init&#38;gt;(Job.java:50)&#60;br /&#62;
	at org.apache.hadoop.mapreduce.Job.&#38;lt;init&#38;gt;(Job.java:54)&#60;br /&#62;
	at ntoklo.matrix.impl.computation.hadoop.DependencyGraph.run(DependencyGraph.java:87)&#60;br /&#62;
	at ntoklo.matrix.impl.controller.DataController.computeGraphs(DataController.java:206)&#60;br /&#62;
	at ntoklo.matrix.impl.MatrixImpl.serviceComputeGraphs(MatrixImpl.java:89)&#60;br /&#62;
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)&#60;br /&#62;
	at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)&#60;br /&#62;
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)&#60;br /&#62;
	at java.lang.reflect.Method.invoke(Unknown Source)&#60;br /&#62;
	at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:167)&#60;br /&#62;
	at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:70)&#60;br /&#62;
	at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:279)&#60;br /&#62;
	at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:136)&#60;br /&#62;
	at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:86)&#60;br /&#62;
	at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:136)&#60;br /&#62;
	at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:74)&#60;br /&#62;
	at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1347)&#60;br /&#62;
	at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1279)&#60;br /&#62;
	at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1229)&#60;br /&#62;
	at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1219)&#60;br /&#62;
	at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:419)&#60;br /&#62;
	at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)&#60;br /&#62;
	at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)&#60;br /&#62;
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)&#60;br /&#62;
	at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:216)&#60;br /&#62;
	at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:141)&#60;br /&#62;
	at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:93)&#60;br /&#62;
	at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:63)&#60;br /&#62;
	at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:122)&#60;br /&#62;
	at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:110)&#60;br /&#62;
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)&#60;br /&#62;
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)&#60;br /&#62;
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)&#60;br /&#62;
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)&#60;br /&#62;
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)&#60;br /&#62;
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)&#60;br /&#62;
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)&#60;br /&#62;
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)&#60;br /&#62;
	at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)&#60;br /&#62;
	at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)&#60;br /&#62;
	at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)&#60;br /&#62;
	at java.lang.Thread.run(Unknown Source)&#60;br /&#62;
Caused by: java.io.EOFException&#60;br /&#62;
	at java.io.DataInputStream.readInt(Unknown Source)&#60;br /&#62;
	at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)&#60;br /&#62;
	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)&#60;/p&#62;
&#60;p&#62;From the Cassandra/Brisk system.log:&#60;/p&#62;
&#60;p&#62;WARN [pool-3-thread-1] 2011-07-06 11:34:10,110 Server.java (line 1110) Incorrect header or version mismatch from 10.20.5.191:57394 got version 3 expected version 4
&#60;/p&#62;</description>
		</item>
		<item>
			<title>tjake on "Submitting a Map-Reduce Job using CassandraJobConf"</title>
			<link>http://www.datastax.com/support-forums/topic/submitting-a-map-reduce-job-using-cassandrajobconf#post-266</link>
			<pubDate>Tue, 05 Jul 2011 16:59:34 +0000</pubDate>
			<dc:creator>tjake</dc:creator>
			<guid isPermaLink="false">266@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;No, but this explains the problem :)&#60;/p&#62;
&#60;p&#62;You need to know ahead of time what the jobtracker address is.  Assuming you have run: brisktool jobtracker and have the location&#60;br /&#62;
you must set this as the value of the property mapred.job.tracker in your configuration...
&#60;/p&#62;</description>
		</item>
		<item>
			<title>deltafoxtrot on "Submitting a Map-Reduce Job using CassandraJobConf"</title>
			<link>http://www.datastax.com/support-forums/topic/submitting-a-map-reduce-job-using-cassandrajobconf#post-265</link>
			<pubDate>Tue, 05 Jul 2011 16:54:13 +0000</pubDate>
			<dc:creator>deltafoxtrot</dc:creator>
			<guid isPermaLink="false">265@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;It is executed via a call to a restful endpoint in a web app. We just create a new instance of the class that extends Tool and execute the run method.  Are we required to use the command line to schedule MR jobs in Brisk?
&#60;/p&#62;</description>
		</item>
		<item>
			<title>tjake on "Submitting a Map-Reduce Job using CassandraJobConf"</title>
			<link>http://www.datastax.com/support-forums/topic/submitting-a-map-reduce-job-using-cassandrajobconf#post-263</link>
			<pubDate>Tue, 05 Jul 2011 16:31:09 +0000</pubDate>
			<dc:creator>tjake</dc:creator>
			<guid isPermaLink="false">263@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;How are you executing the hadoop job?&#60;/p&#62;
&#60;p&#62;You should be using &#34;brisk hadoop jar myjar.jar ...&#34;
&#60;/p&#62;</description>
		</item>
		<item>
			<title>deltafoxtrot on "Submitting a Map-Reduce Job using CassandraJobConf"</title>
			<link>http://www.datastax.com/support-forums/topic/submitting-a-map-reduce-job-using-cassandrajobconf#post-262</link>
			<pubDate>Tue, 05 Jul 2011 16:28:57 +0000</pubDate>
			<dc:creator>deltafoxtrot</dc:creator>
			<guid isPermaLink="false">262@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Thanks again for your response.  The updated run method is now:&#60;/p&#62;
&#60;p&#62;	@Override&#60;br /&#62;
	public int run(String[] args) throws Exception {&#60;br /&#62;
		Configuration conf = new Configuration();&#60;br /&#62;
		new GenericOptionsParser(conf, args).getRemainingArgs();&#60;br /&#62;
//		conf.set(&#34;mapred.job.tracker&#34;, CassandraJobConf.getJobTrackerNode().getHostName() + &#34;:8012&#34;);&#60;br /&#62;
//		conf.set(&#34;mapreduce.jobtracker.address&#34;, CassandraJobConf.getJobTrackerNode().getHostName() + &#34;:8012&#34;);&#60;br /&#62;
		Job job = new Job(conf, &#34;dependency.graph&#34;);&#60;br /&#62;
[...]&#60;/p&#62;
&#60;p&#62;However, it doesn't appear that Brisk sets the JobTracker host.  I've dumped the configuration properties before the job is submitted.&#60;/p&#62;
&#60;p&#62;05-07-2011 17:21:20 [http-8080-1]  INFO controller.DataController - Computing targets graphs started at: Tue Jul 05 17:21:20 BST 2011&#60;br /&#62;
05-07-2011 17:21:20 [http-8080-1]  INFO cassandra.AbstractStore - Truncating column family 'data_graphs'&#60;br /&#62;
05-07-2011 17:21:20 [http-8080-1]  INFO hadoop.DependencyGraph - Configuring new DependencyGraph job against host localhost and port 9170&#60;br /&#62;
05-07-2011 17:21:20 [http-8080-1]  INFO hadoop.DependencyGraph - Configuration Properties:&#60;br /&#62;
io.seqfile.compress.blocksize=1000000&#60;br /&#62;
fs.checkpoint.size=67108864&#60;br /&#62;
io.skip.checksum.errors=false&#60;br /&#62;
mapred.used.genericoptionsparser=true&#60;br /&#62;
fs.s3n.impl=org.apache.hadoop.fs.s3native.NativeS3FileSystem&#60;br /&#62;
fs.s3.maxRetries=4&#60;br /&#62;
webinterface.private.actions=false&#60;br /&#62;
fs.s3.impl=org.apache.hadoop.fs.s3.S3FileSystem&#60;br /&#62;
hadoop.native.lib=true&#60;br /&#62;
fs.checkpoint.edits.dir=${fs.checkpoint.dir}&#60;br /&#62;
ipc.server.listen.queue.size=128&#60;br /&#62;
fs.default.name=file:///&#60;br /&#62;
ipc.client.idlethreshold=4000&#60;br /&#62;
fs.hsftp.impl=org.apache.hadoop.hdfs.HsftpFileSystem&#60;br /&#62;
hadoop.tmp.dir=/tmp/hadoop-${user.name}&#60;br /&#62;
fs.checkpoint.dir=${hadoop.tmp.dir}/dfs/namesecondary&#60;br /&#62;
fs.s3.block.size=67108864&#60;br /&#62;
hadoop.security.authorization=false&#60;br /&#62;
io.serializations=org.apache.hadoop.io.serializer.WritableSerialization&#60;br /&#62;
hadoop.util.hash.type=murmur&#60;br /&#62;
io.seqfile.lazydecompress=true&#60;br /&#62;
io.file.buffer.size=4096&#60;br /&#62;
io.mapfile.bloom.size=1048576&#60;br /&#62;
fs.s3.buffer.dir=${hadoop.tmp.dir}/s3&#60;br /&#62;
hadoop.logfile.size=10000000&#60;br /&#62;
ipc.client.kill.max=10&#60;br /&#62;
io.compression.codecs=org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec&#60;br /&#62;
topology.script.number.args=100&#60;br /&#62;
fs.har.impl=org.apache.hadoop.fs.HarFileSystem&#60;br /&#62;
io.seqfile.sorter.recordlimit=1000000&#60;br /&#62;
fs.trash.interval=0&#60;br /&#62;
local.cache.size=10737418240&#60;br /&#62;
ipc.server.tcpnodelay=false&#60;br /&#62;
ipc.client.connect.max.retries=10&#60;br /&#62;
fs.ramfs.impl=org.apache.hadoop.fs.InMemoryFileSystem&#60;br /&#62;
hadoop.rpc.socket.factory.class.default=org.apache.hadoop.net.StandardSocketFactory&#60;br /&#62;
fs.kfs.impl=org.apache.hadoop.fs.kfs.KosmosFileSystem&#60;br /&#62;
fs.checkpoint.period=3600&#60;br /&#62;
topology.node.switch.mapping.impl=org.apache.hadoop.net.ScriptBasedMapping&#60;br /&#62;
hadoop.logfile.count=10&#60;br /&#62;
fs.ftp.impl=org.apache.hadoop.fs.ftp.FTPFileSystem&#60;br /&#62;
fs.file.impl=org.apache.hadoop.fs.LocalFileSystem&#60;br /&#62;
fs.hdfs.impl=org.apache.hadoop.hdfs.DistributedFileSystem&#60;br /&#62;
ipc.client.connection.maxidletime=10000&#60;br /&#62;
io.mapfile.bloom.error.rate=0.005&#60;br /&#62;
io.bytes.per.checksum=512&#60;br /&#62;
fs.har.impl.disable.cache=true&#60;br /&#62;
ipc.client.tcpnodelay=false&#60;br /&#62;
fs.hftp.impl=org.apache.hadoop.hdfs.HftpFileSystem&#60;br /&#62;
fs.s3.sleepTimeSeconds=10&#60;/p&#62;
&#60;p&#62;If I manually set the mapreduce.jobtracker.address property on the configuration (e.g., conf.set(&#34;mapreduce.jobtracker.address&#34;, CassandraJobConf.getJobTrackerNode().getHostName() + &#34;:8012&#34;)), then an exception will be thrown to indicate it can't find the cassandra.yaml (see below).&#60;/p&#62;
&#60;p&#62;05-07-2011 17:07:29 [http-8080-1] ERROR config.DatabaseDescriptor - Fatal configuration error&#60;br /&#62;
org.apache.cassandra.config.ConfigurationException: Cannot locate cassandra.yaml&#60;br /&#62;
	at org.apache.cassandra.config.DatabaseDescriptor.getStorageConfigURL(DatabaseDescriptor.java:111)&#60;br /&#62;
	at org.apache.cassandra.config.DatabaseDescriptor.&#38;lt;clinit&#38;gt;(DatabaseDescriptor.java:121)&#60;br /&#62;
	at org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:397)&#60;br /&#62;
	at org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:94)&#60;br /&#62;
	at org.apache.cassandra.db.SliceByNamesReadCommand.&#38;lt;init&#38;gt;(SliceByNamesReadCommand.java:44)&#60;br /&#62;
	at org.apache.cassandra.db.SliceByNamesReadCommand.&#38;lt;init&#38;gt;(SliceByNamesReadCommand.java:38)&#60;br /&#62;
	at org.apache.cassandra.hadoop.trackers.TrackerManager.getCurrentJobtrackerLocation(TrackerManager.java:51)&#60;br /&#62;
	at org.apache.cassandra.hadoop.trackers.CassandraJobConf.getJobTrackerNode(CassandraJobConf.java:62)&#60;br /&#62;
	at ntoklo.matrix.impl.computation.hadoop.DependencyGraph.run(DependencyGraph.java:77)&#60;br /&#62;
	at ntoklo.matrix.impl.controller.DataController.computeGraphs(DataController.java:206)&#60;br /&#62;
	at ntoklo.matrix.impl.MatrixImpl.serviceComputeGraphs(MatrixImpl.java:89)&#60;br /&#62;
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)&#60;br /&#62;
	at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)&#60;br /&#62;
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)&#60;br /&#62;
	at java.lang.reflect.Method.invoke(Unknown Source)&#60;br /&#62;
	at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:167)&#60;br /&#62;
	at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:70)&#60;br /&#62;
	at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:279)&#60;br /&#62;
	at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:136)&#60;br /&#62;
	at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:86)&#60;br /&#62;
	at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:136)&#60;br /&#62;
	at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:74)&#60;br /&#62;
	at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1347)&#60;br /&#62;
	at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1279)&#60;br /&#62;
	at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1229)&#60;br /&#62;
	at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1219)&#60;br /&#62;
	at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:419)&#60;br /&#62;
	at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)&#60;br /&#62;
	at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)&#60;br /&#62;
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)&#60;/p&#62;
&#60;p&#62;If I make the brisk cassandra.yaml available to the web app classpath, then I get the following error:&#60;/p&#62;
&#60;p&#62;05-07-2011 17:12:43 [http-8080-1]  INFO config.DatabaseDescriptor - Loading settings from file:/opt/SP/apps/ntoklo/services%23matrix/WEB-INF/classes/cassandra.yaml&#60;br /&#62;
05-07-2011 17:12:43 [http-8080-1]  INFO config.DatabaseDescriptor - DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap&#60;br /&#62;
05-07-2011 17:12:43 [http-8080-1] ERROR hadoop.DependencyGraph - java.lang.IllegalArgumentException: Unknown ColumnFamily jobtracker in keyspace brisk_system&#60;/p&#62;
&#60;p&#62;Any thoughts?  And again, many thanks for your assistance!
&#60;/p&#62;</description>
		</item>
		<item>
			<title>tjake on "Submitting a Map-Reduce Job using CassandraJobConf"</title>
			<link>http://www.datastax.com/support-forums/topic/submitting-a-map-reduce-job-using-cassandrajobconf#post-261</link>
			<pubDate>Tue, 05 Jul 2011 14:57:19 +0000</pubDate>
			<dc:creator>tjake</dc:creator>
			<guid isPermaLink="false">261@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Ah looks like you need to add the following to your main class (where conf is the JobConf instance):&#60;/p&#62;
&#60;p&#62;String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();&#60;/p&#62;
&#60;p&#62;This will allow brisk to set the jobtracker host.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>deltafoxtrot on "Submitting a Map-Reduce Job using CassandraJobConf"</title>
			<link>http://www.datastax.com/support-forums/topic/submitting-a-map-reduce-job-using-cassandrajobconf#post-260</link>
			<pubDate>Tue, 05 Jul 2011 14:47:15 +0000</pubDate>
			<dc:creator>deltafoxtrot</dc:creator>
			<guid isPermaLink="false">260@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Thanks for responding so quickly!  After swapping out the CassandraJobConf for the deprecated JobConf, we still cannot see the job in the JobTracker or TaskTracker sites.  The log output is posted below.  We only have a single node configured at the moment in our test environment.  We can start the node with or without Hadoop enabled and the job still runs as seen below.  Anything else you might suggest?&#60;/p&#62;
&#60;p&#62;05-07-2011 15:38:31 [http-8080-1]  INFO controller.DataController - Computing targets graphs started at: Tue Jul 05 15:38:31 BST 2011&#60;br /&#62;
05-07-2011 15:38:31 [http-8080-1]  INFO cassandra.AbstractStore - Truncating column family 'data_graphs'&#60;br /&#62;
05-07-2011 15:38:31 [http-8080-1]  INFO hadoop.DependencyGraph - Configuring new DependencyGraph job against host localhost and port 9170&#60;br /&#62;
05-07-2011 15:38:32 [http-8080-1]  INFO jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=&#60;br /&#62;
05-07-2011 15:38:32 [http-8080-1]  INFO hadoop.DependencyGraph - Submitting DependencyGraph job...&#60;br /&#62;
05-07-2011 15:38:32 [http-8080-1]  WARN mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.&#60;br /&#62;
05-07-2011 15:38:33 [http-8080-1]  INFO mapred.JobClient - Running job: job_local_0001&#60;br /&#62;
05-07-2011 15:38:33 [Thread-20]  INFO mapred.MapTask - io.sort.mb = 100&#60;br /&#62;
05-07-2011 15:38:33 [Thread-20]  INFO mapred.MapTask - data buffer = 79691776/99614720&#60;br /&#62;
05-07-2011 15:38:33 [Thread-20]  INFO mapred.MapTask - record buffer = 262144/327680&#60;br /&#62;
05-07-2011 15:38:34 [http-8080-1]  INFO mapred.JobClient -  map 0% reduce 0%&#60;br /&#62;
05-07-2011 15:38:37 [Thread-20]  INFO mapred.MapTask - Spilling map output: record full = true&#60;br /&#62;
05-07-2011 15:38:37 [Thread-20]  INFO mapred.MapTask - bufstart = 0; bufend = 5613510; bufvoid = 99614720&#60;br /&#62;
05-07-2011 15:38:37 [Thread-20]  INFO mapred.MapTask - kvstart = 0; kvend = 262144; length = 327680&#60;br /&#62;
05-07-2011 15:38:38 [SpillThread]  INFO mapred.MapTask - Finished spill 0&#60;br /&#62;
[...]&#60;br /&#62;
05-07-2011 15:42:51 [Thread-20]  INFO mapred.LocalJobRunner - reduce &#38;gt; reduce&#60;br /&#62;
05-07-2011 15:42:51 [Thread-20]  INFO mapred.TaskRunner - Task 'attempt_local_0001_r_000000_0' done.&#60;br /&#62;
05-07-2011 15:42:51 [http-8080-1]  INFO mapred.JobClient -  map 100% reduce 100%&#60;br /&#62;
05-07-2011 15:42:52 [http-8080-1]  INFO mapred.JobClient - Job complete: job_local_0001&#60;br /&#62;
05-07-2011 15:42:52 [http-8080-1]  INFO mapred.JobClient - Counters: 12&#60;br /&#62;
05-07-2011 15:42:52 [http-8080-1]  INFO mapred.JobClient -   FileSystemCounters&#60;br /&#62;
05-07-2011 15:42:52 [http-8080-1]  INFO mapred.JobClient -     FILE_BYTES_READ=6703653590&#60;br /&#62;
05-07-2011 15:42:52 [http-8080-1]  INFO mapred.JobClient -     FILE_BYTES_WRITTEN=12441116494&#60;br /&#62;
05-07-2011 15:42:52 [http-8080-1]  INFO mapred.JobClient -   Map-Reduce Framework&#60;br /&#62;
05-07-2011 15:42:52 [http-8080-1]  INFO mapred.JobClient -     Reduce input groups=555796&#60;br /&#62;
05-07-2011 15:42:52 [http-8080-1]  INFO mapred.JobClient -     Combine output records=0&#60;br /&#62;
05-07-2011 15:42:52 [http-8080-1]  INFO mapred.JobClient -     Map input records=1728478&#60;br /&#62;
05-07-2011 15:42:52 [http-8080-1]  INFO mapred.JobClient -     Reduce shuffle bytes=0&#60;br /&#62;
05-07-2011 15:42:52 [http-8080-1]  INFO mapred.JobClient -     Reduce output records=555796&#60;br /&#62;
05-07-2011 15:42:52 [http-8080-1]  INFO mapred.JobClient -     Spilled Records=56052510&#60;br /&#62;
05-07-2011 15:42:52 [http-8080-1]  INFO mapred.JobClient -     Map output bytes=317681794&#60;br /&#62;
05-07-2011 15:42:52 [http-8080-1]  INFO mapred.JobClient -     Combine input records=0&#60;br /&#62;
05-07-2011 15:42:52 [http-8080-1]  INFO mapred.JobClient -     Map output records=14839085&#60;br /&#62;
05-07-2011 15:42:52 [http-8080-1]  INFO mapred.JobClient -     Reduce input records=14839085
&#60;/p&#62;</description>
		</item>
		<item>
			<title>tjake on "Submitting a Map-Reduce Job using CassandraJobConf"</title>
			<link>http://www.datastax.com/support-forums/topic/submitting-a-map-reduce-job-using-cassandrajobconf#post-259</link>
			<pubDate>Tue, 05 Jul 2011 12:47:20 +0000</pubDate>
			<dc:creator>tjake</dc:creator>
			<guid isPermaLink="false">259@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;You don't need to pass the CassandraJobConf, just use the regular JobConf.&#60;/p&#62;
&#60;p&#62;Implementing Tool is the only required step.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>deltafoxtrot on "Submitting a Map-Reduce Job using CassandraJobConf"</title>
			<link>http://www.datastax.com/support-forums/topic/submitting-a-map-reduce-job-using-cassandrajobconf#post-258</link>
			<pubDate>Tue, 05 Jul 2011 12:15:30 +0000</pubDate>
			<dc:creator>deltafoxtrot</dc:creator>
			<guid isPermaLink="false">258@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Apologies in advance if this seems a bit 101, but I'm struggling to submit a Map-Reduce job to the Brisk-managed JobTracker via code.  The class that represents the MR job extends Configured and implements Tool.  In its run method, I pass a new CassandraJobConf to the Job constructor (see code below).  While the job runs, it seems to spawn it's own Hadoop framework to process the job as opposed to passing the job along to the Brisk-managed JobTracker (it also doesn't show in the JobTracker or TaskTracker web apps).  I haven't seen an example using the CassandraJobConf so not sure if this is the intended usage.  Any thoughts would be greatly appreciated.  &#60;/p&#62;
&#60;p&#62;	@Override&#60;br /&#62;
	public int run(String[] args) throws Exception {&#60;/p&#62;
&#60;p&#62;		Job job = new Job(new CassandraJobConf(), &#34;dependency.graph&#34;);&#60;br /&#62;
		job.setJarByClass(DependencyGraph.class);&#60;br /&#62;
		job.setMapperClass(DependencyGraphMapper.class);&#60;br /&#62;
		job.setReducerClass(DependencyGraphReducer.class);&#60;/p&#62;
&#60;p&#62;		job.setMapOutputKeyClass(Text.class);&#60;br /&#62;
		job.setMapOutputValueClass(Text.class);&#60;br /&#62;
		job.setOutputKeyClass(ByteBuffer.class);&#60;br /&#62;
		job.setOutputValueClass(List.class);&#60;/p&#62;
&#60;p&#62;		job.setOutputFormatClass(ColumnFamilyOutputFormat.class);&#60;br /&#62;
		job.setInputFormatClass(ColumnFamilyInputFormat.class);&#60;/p&#62;
&#60;p&#62;		ConfigHelper.setRpcPort(job.getConfiguration(), Integer.valueOf(port).toString());&#60;br /&#62;
		ConfigHelper.setInitialAddress(job.getConfiguration(), host);&#60;br /&#62;
		ConfigHelper.setPartitioner(job.getConfiguration(), RandomPartitioner.class.getCanonicalName());&#60;br /&#62;
		ConfigHelper.setInputColumnFamily(job.getConfiguration(), IN_KEYSPACE, IN_COLUMN_FAMILY);&#60;br /&#62;
		ConfigHelper.setOutputColumnFamily(job.getConfiguration(), OUT_KEYSPACE, OUT_COLUMN_FAMILY);&#60;/p&#62;
&#60;p&#62;		SlicePredicate predicate = new SlicePredicate();&#60;br /&#62;
		predicate.setSlice_range(new SliceRange(bytes(null), bytes(null), false, Integer.MAX_VALUE));&#60;/p&#62;
&#60;p&#62;		ConfigHelper.setInputSlicePredicate(job.getConfiguration(), predicate);&#60;/p&#62;
&#60;p&#62;		job.waitForCompletion(true);&#60;/p&#62;
&#60;p&#62;		return 0;&#60;br /&#62;
	}
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
