<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>DataStax Support Forums &#187; Topic: Aggregate data by columns in Pig</title>
		<link>http://www.datastax.com/support-forums/topic/agregate-data-by-columns-in-pig</link>
		<description>Software, Support, and Training for Apache Cassandra</description>
		<language>en-US</language>
		<pubDate>Sat, 25 May 2013 01:59:16 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.3</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://www.datastax.com/support-forums/search.php</link>
		</textInput>
		<atom:link href="http://www.datastax.com/support-forums/rss/topic/agregate-data-by-columns-in-pig" rel="self" type="application/rss+xml" />

		<item>
			<title>Anonymous on "Aggregate data by columns in Pig"</title>
			<link>http://www.datastax.com/support-forums/topic/agregate-data-by-columns-in-pig#post-1828</link>
			<pubDate>Thu, 03 May 2012 15:28:19 +0000</pubDate>
			<dc:creator>Anonymous</dc:creator>
			<guid isPermaLink="false">1828@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Would it help to flatten the bag out first? then you can use of foreach or group to re-arrange. &#60;/p&#62;
&#60;p&#62;e.g.&#60;br /&#62;
&#60;code&#62;&#60;br /&#62;
p1 = LOAD 'cassandra://keyspace/CF' USING CassandraStorage() AS (key:chararray, columns: bag{T: tuple(property:chararray, value:chararray)});&#60;br /&#62;
p2 = FOREACH p1 GENERATE key, flatten (columns);&#60;br /&#62;
&#60;/code&#62;&#60;/p&#62;
&#60;p&#62;You'll get a tuples of (key, property, value) and can then filter, split, group by, etc..&#60;/p&#62;
&#60;p&#62;Although, let me know if you're able use GROUP BY or JOIN on the key, I keep getting an error - recently created a post, but no replies yet.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Anonymous on "Aggregate data by columns in Pig"</title>
			<link>http://www.datastax.com/support-forums/topic/agregate-data-by-columns-in-pig#post-1821</link>
			<pubDate>Wed, 02 May 2012 16:47:39 +0000</pubDate>
			<dc:creator>Anonymous</dc:creator>
			<guid isPermaLink="false">1821@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Probably the only way would be to add all/some of the columns to the metadata so they become separated in the Pig schema.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Anonymous on "Aggregate data by columns in Pig"</title>
			<link>http://www.datastax.com/support-forums/topic/agregate-data-by-columns-in-pig#post-1820</link>
			<pubDate>Wed, 02 May 2012 16:35:12 +0000</pubDate>
			<dc:creator>Anonymous</dc:creator>
			<guid isPermaLink="false">1820@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;I have data in Cassandra in following structure:&#60;/p&#62;
&#60;pre&#62;&#60;code&#62;[default@data] list values;
Using default limit of 100
-------------------
RowKey: 123:1333065600000
=&#38;gt; (column=16000, value=331, timestamp=1333724446041000)
=&#38;gt; (column=76000, value=314, timestamp=1333724446042000)
=&#38;gt; (column=136000, value=333, timestamp=1333724446043000)
RowKey: 123:1332979200000
=&#38;gt; (column=6000, value=300, timestamp=1333743661692000)
=&#38;gt; (column=66000, value=302, timestamp=1333743661692001)
=&#38;gt; (column=126000, value=303, timestamp=1333743661694000)
=&#38;gt; (column=186000, value=234, timestamp=1333743661695000)
=&#38;gt; (column=246000, value=445, timestamp=1333743661696000)
=&#38;gt; (column=306000, value=331, timestamp=1333743661696001)
=&#38;gt; (column=366000, value=455, timestamp=1333743661698000)
RowKey: 121:1334102400000
=&#38;gt; (column=68608000, value=12, timestamp=1334173122715000)
=&#38;gt; (column=68668000, value=12, timestamp=1334173122715001)&#60;/code&#62;&#60;/pre&#62;
&#60;p&#62;In other words - number of columns can differ row per row. Now I need to do aggregation using pig of all values. I cannot change the structure. Any idea how to do that?
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
