<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>DataStax Support Forums &#187; User Favorites: patrick</title>
		<link><a href='http://www.datastax.com/support-forums/profile/patrick'>patrick</a></link>
		<description>Software, Support, and Training for Apache Cassandra</description>
		<language>en-US</language>
		<pubDate>Sun, 19 May 2013 11:34:39 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.3</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://www.datastax.com/support-forums/search.php</link>
		</textInput>
		<atom:link href="http://www.datastax.com/support-forums/rss/profile/" rel="self" type="application/rss+xml" />

		<item>
			<title>Patrick on "Cassandra and Python M/R - ColumnFamilyInputFormat and mrjob"</title>
			<link>http://www.datastax.com/support-forums/topic/cassandra-and-python-mr-columnfamilyinputformat-and-mrjob#post-8809</link>
			<pubDate>Wed, 30 Jan 2013 18:57:01 +0000</pubDate>
			<dc:creator>Patrick</dc:creator>
			<guid isPermaLink="false">8809@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Has anyone had any luck using Cassandra as an input source for map/reduce (AWS EMR) jobs using the Python mrjob toolkit?&#60;/p&#62;
&#60;p&#62;I am using the distribution from apache-cassandra-1.1.9-bin.tar.gz .  The input format is the ColumnFamilyInputFormat from apache-cassandra-1.1.9/lib/apache-cassandra-1.1.9.jar .&#60;/p&#62;
&#60;p&#62;I have configured the job to use the input format by&#60;br /&#62;
* Setting self.HADOOP_INPUT_FORMAT = 'org.apache.cassandra.hadoop.ColumnFamilyInputFormat'&#60;br /&#62;
* Defining configuration via jobconf and variables 'cassandra.(input.keyspace, input.columnfamily, input.predicate, ...)'&#60;br /&#62;
The job seems to find the input format class successfully.&#60;/p&#62;
&#60;p&#62;The problem is that mrjob seems to require that some input path be passed to the job.  For the standard input formats, the input path points to a local file (that is uploaded to S3), or an S3 object, etc.  What should the input path be to indicate that it should pull data from the Cassandra cluster using configuration defined in the jobconf?  Is there some notation like cassandra://... that should be used here?  Between code in runner.py and emr.py, mrjob enforces that some input path be defined, or it defaults to stdin.  Even if I make changes to the code to allow an empty list of inputs, the framework does not seem to delegate to the Cassandra input format.&#60;/p&#62;
&#60;p&#62;Has anyone been able to get this to work?  What steps am I missing?&#60;/p&#62;
&#60;p&#62;thanks-&#60;br /&#62;
patrick
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
