<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>DataStax Support Forums &#187; Topic: Pig column count limit</title>
		<link>http://www.datastax.com/support-forums/topic/pig-column-count-limit</link>
		<description>Software, Support, and Training for Apache Cassandra</description>
		<language>en-US</language>
		<pubDate>Fri, 24 May 2013 22:50:09 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.3</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://www.datastax.com/support-forums/search.php</link>
		</textInput>
		<atom:link href="http://www.datastax.com/support-forums/rss/topic/pig-column-count-limit" rel="self" type="application/rss+xml" />

		<item>
			<title>Anonymous on "Pig column count limit"</title>
			<link>http://www.datastax.com/support-forums/topic/pig-column-count-limit#post-903</link>
			<pubDate>Wed, 21 Dec 2011 09:09:16 +0000</pubDate>
			<dc:creator>Anonymous</dc:creator>
			<guid isPermaLink="false">903@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Hi everybody,&#60;br /&#62;
I have installed DSE 1.0.1 on 3 nodes cluster to test a pig map/reduce job.&#60;br /&#62;
I want to count columns for each row of a ColumnFamily with this script :&#60;/p&#62;
&#60;p&#62;rows = LOAD 'cassandra://&#38;lt;keyspace&#38;gt;/&#38;lt;columnfamily&#38;gt;' USING CassandraStorage();&#60;br /&#62;
counted = FOREACH rows GENERATE $0,COUNT_STAR($1);&#60;br /&#62;
STORE counted INTO '/tmp/column-count-result' USING PigStorage();&#60;/p&#62;
&#60;p&#62;It works fine, but there is a default limit of column number loaded (1024)&#60;br /&#62;
So I replace my LOAD block by this :&#60;/p&#62;
&#60;p&#62;rows = LOAD 'cassandra://&#38;lt;keyspace&#38;gt;/&#38;lt;columnfamily&#38;gt;?limit=2147483647' USING CassandraStorage();&#60;/p&#62;
&#60;p&#62;with 2147483647 = Integer.MAX_VALUE = max column number in a row of a ColumnFamily.&#60;/p&#62;
&#60;p&#62;Unfortunately, it appears some TimedOutException and OutOfMemoryError: Java heap space in Cassandra logs, and the job doesn't succeed.&#60;br /&#62;
So I try to play with the config like indicated here :&#60;/p&#62;
&#60;p&#62;&#60;a href=&#34;http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting&#34; rel=&#34;nofollow&#34;&#62;http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting&#60;/a&#62;&#60;/p&#62;
&#60;p&#62;But the result is the same.&#60;/p&#62;
&#60;p&#62;Someboby have an idea on how configure DSE in order to count a large number of column ?&#60;/p&#62;
&#60;p&#62;Thanks
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
