<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>DataStax Support Forums &#187; User Favorites: cko</title>
		<link><a href='http://www.datastax.com/support-forums/profile/cko'>cko</a></link>
		<description>Software, Support, and Training for Apache Cassandra</description>
		<language>en-US</language>
		<pubDate>Sun, 19 May 2013 02:08:15 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.3</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://www.datastax.com/support-forums/search.php</link>
		</textInput>
		<atom:link href="http://www.datastax.com/support-forums/rss/profile/" rel="self" type="application/rss+xml" />

		<item>
			<title>cko on "Pig relation schema incorrectly included an extra tuple after creating Cassandra secondary index"</title>
			<link>http://www.datastax.com/support-forums/topic/pig-relation-schema-incorrectly-included-an-extra-tuple-after-creating-cassandra-secondary-index#post-8316</link>
			<pubDate>Mon, 07 Jan 2013 22:10:11 +0000</pubDate>
			<dc:creator>cko</dc:creator>
			<guid isPermaLink="false">8316@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;I found a solution myself.&#60;/p&#62;
&#60;p&#62;The extraenous tuple &#34;udid&#34; came from the Cassandra column meta data after a secondary index was created on the column &#34;udid&#34;. &#60;/p&#62;
&#60;p&#62;Looking closer at the output of the &#34;DUMP interaction&#34;, Pig did include a tuple for the Cassandra column &#34;udid&#34; but this column was not included in the bag at the end of the relation schema, therefore the FILTER in the FOREACH statement did not include any rows from Cassandra.&#60;/p&#62;
&#60;p&#62;Removing the FILTER in the FOREACH and referencing the udid.value in the tuple solved the problem.&#60;/p&#62;
&#60;p&#62;flattenedInteraction = FOREACH interaction {&#60;br /&#62;
        session = FILTER columns by name == 'sessionId';&#60;br /&#62;
        GENERATE&#60;br /&#62;
            udid.value as udid,&#60;br /&#62;
            flatten(session.value) as sessionId;&#60;br /&#62;
}
&#60;/p&#62;</description>
		</item>
		<item>
			<title>schumacr on "Pig unable to connect to Cassandra that requires authentication"</title>
			<link>http://www.datastax.com/support-forums/topic/pig-unable-to-connect-to-cassandra-that-requires-authentication#post-8314</link>
			<pubDate>Mon, 07 Jan 2013 16:48:41 +0000</pubDate>
			<dc:creator>schumacr</dc:creator>
			<guid isPermaLink="false">8314@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Typically we allow for a 3-4 month time lag between the time a new C* release occurs and when it is integrated into DSE. This allows the new release to bake in the community plus go through our certification program that ensures it's ready for a production setting. For 1.2, I would look for integration in DSE sometime late Q1 or early Q2.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>cko on "Pig relation schema incorrectly included an extra tuple after creating Cassandra secondary index"</title>
			<link>http://www.datastax.com/support-forums/topic/pig-relation-schema-incorrectly-included-an-extra-tuple-after-creating-cassandra-secondary-index#post-8309</link>
			<pubDate>Mon, 07 Jan 2013 06:28:53 +0000</pubDate>
			<dc:creator>cko</dc:creator>
			<guid isPermaLink="false">8309@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Hi,&#60;/p&#62;
&#60;p&#62;I have a Cassandra column family InteractionCF. Two of the columns are named udid and sessiondId.&#60;/p&#62;
&#60;p&#62;I ran a Pig job like this to extract the rows. &#60;/p&#62;
&#60;p&#62;interaction = LOAD 'cassandra://DEVKS/InteractionCF' USING CassandraStorage();&#60;br /&#62;
DESCRIBE interaction;&#60;br /&#62;
DUMP interaction;&#60;br /&#62;
flattenedInteraction = FOREACH interaction {&#60;br /&#62;
         device = FILTER columns by name == 'udid';&#60;br /&#62;
         session = FILTER columns by name == 'sessionId';&#60;br /&#62;
         GENERATE&#60;br /&#62;
             flatten(device.value) as udid,&#60;br /&#62;
             flatten(session.value) as sessionId;&#60;br /&#62;
}&#60;br /&#62;
DUMP flattenedInteraction;&#60;/p&#62;
&#60;p&#62;The Pig relation schema shown by the DESCRIBE function is&#60;br /&#62;
interaction: {key: chararray,columns: {(name: chararray,value: chararray)}}&#60;/p&#62;
&#60;p&#62;The output data of the last DUMP was fine.&#60;/p&#62;
&#60;p&#62;I then ran CLI to create a secondary index on the column udid. &#60;/p&#62;
&#60;p&#62;update column family InteractionCF with&#60;br /&#62;
	column_metadata =&#60;br /&#62;
	[&#60;br /&#62;
	{column_name: udid, validation_class: UTF8Type, index_type: KEYS}&#60;br /&#62;
	];&#60;/p&#62;
&#60;p&#62;When the same Pig job was run again, the Pig relation schema has an extra tuple udid&#60;/p&#62;
&#60;p&#62;interaction: {key: chararray,udid: (name: chararray,value: chararray),columns: {(name: chararray,value: chararray)}}&#60;/p&#62;
&#60;p&#62;&#34;DUMP interaction&#34; gave the same data as in the first run without secondary index, but &#34;DUMP flattenedInteraction&#34; was empty. Apparently this problem was caused by the extraneous tuple udid in the relation schema which did not match the actual data returned by the Cassandra storage handler.&#60;/p&#62;
&#60;p&#62;If I tried supplying the schema explicitly in the LOAD statement to exclude the extraneous tuple udid, Pig complaint that the schema is not compatible.&#60;/p&#62;
&#60;p&#62;grunt&#38;gt; interaction = LOAD 'cassandra://DEVKS/InteractionCF' USING CassandraStorage() AS (rowKey:chararray, columns:bag{T:tuple(columnName:chararray, columnValue:chararray)});&#60;/p&#62;
&#60;p&#62;2013-01-07 16:10:52,314 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema: left is &#34;rowKey:chararray,columns:bag{T:tuple(columnName:chararray,columnValue:chararray)}&#34;, right is &#34;key:chararray,udid:tuple(name:chararray,value:chararray),columns:bag{:tuple(name:chararray,value:chararray)}&#34;&#60;/p&#62;
&#60;p&#62;************ Pig job output without secondary index *******************&#60;/p&#62;
&#60;p&#62;grunt&#38;gt; interaction = LOAD 'cassandra://DEVKS/InteractionCF' USING CassandraStorage();&#60;br /&#62;
grunt&#38;gt; DESCRIBE interaction;&#60;br /&#62;
interaction: {key: chararray,columns: {(name: chararray,value: chararray)}}&#60;br /&#62;
grunt&#38;gt; DUMP interaction;&#60;br /&#62;
2013-01-07 15:51:25,914 [main] WARN  org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable&#60;br /&#62;
2013-01-07 15:51:26,268 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN&#60;br /&#62;
2013-01-07 15:51:26,350 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false&#60;br /&#62;
2013-01-07 15:51:26,369 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1&#60;br /&#62;
2013-01-07 15:51:26,369 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1&#60;br /&#62;
2013-01-07 15:51:26,518 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job&#60;br /&#62;
2013-01-07 15:51:26,531 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3&#60;br /&#62;
2013-01-07 15:51:26,532 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job1144508861314187229.jar&#60;br /&#62;
2013-01-07 15:51:28,302 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job1144508861314187229.jar created&#60;br /&#62;
2013-01-07 15:51:28,317 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job&#60;br /&#62;
2013-01-07 15:51:28,344 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.&#60;br /&#62;
2013-01-07 15:51:28,530 [Thread-5] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 5&#60;br /&#62;
2013-01-07 15:51:28,845 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201212300711_0020&#60;br /&#62;
2013-01-07 15:51:28,845 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: &#60;a href=&#34;http://10.69.12.123:50030/jobdetails.jsp?jobid=job_201212300711_0020&#34; rel=&#34;nofollow&#34;&#62;http://10.69.12.123:50030/jobdetails.jsp?jobid=job_201212300711_0020&#60;/a&#62;&#60;br /&#62;
2013-01-07 15:51:28,846 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete&#60;br /&#62;
2013-01-07 15:51:39,375 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 10% complete&#60;br /&#62;
2013-01-07 15:51:40,377 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 20% complete&#60;br /&#62;
2013-01-07 15:51:42,382 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 30% complete&#60;br /&#62;
2013-01-07 15:51:43,385 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 40% complete&#60;br /&#62;
2013-01-07 15:51:45,390 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete&#60;br /&#62;
2013-01-07 15:51:53,944 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete&#60;br /&#62;
2013-01-07 15:51:53,945 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:&#60;/p&#62;
&#60;p&#62;HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features&#60;br /&#62;
1.0.2-dse-2.2.1 0.9.2   cassadm 2013-01-07 15:51:26     2013-01-07 15:51:53     UNKNOWN&#60;/p&#62;
&#60;p&#62;Success!&#60;/p&#62;
&#60;p&#62;Job Stats (time in seconds):&#60;br /&#62;
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs&#60;br /&#62;
job_201212300711_0020   5       0       6       4       5       0       0       0       interaction     MAP_ONLY        cfs:/tmp/temp-199271653/tmp236958936,&#60;/p&#62;
&#60;p&#62;Input(s):&#60;br /&#62;
Successfully read 3 records from: &#34;cassandra://DEVKS/InteractionCF&#34;&#60;/p&#62;
&#60;p&#62;Output(s):&#60;br /&#62;
Successfully stored 3 records in: &#34;cfs:/tmp/temp-199271653/tmp236958936&#34;&#60;/p&#62;
&#60;p&#62;Counters:&#60;br /&#62;
Total records written : 3&#60;br /&#62;
Total bytes written : 0&#60;br /&#62;
Spillable Memory Manager spill count : 0&#60;br /&#62;
Total bags proactively spilled: 0&#60;br /&#62;
Total records proactively spilled: 0&#60;/p&#62;
&#60;p&#62;Job DAG:&#60;br /&#62;
job_201212300711_0020&#60;/p&#62;
&#60;p&#62;2013-01-07 15:51:53,953 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!&#60;br /&#62;
2013-01-07 15:51:53,974 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 5&#60;br /&#62;
2013-01-07 15:51:53,974 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 5&#60;/p&#62;
&#60;p&#62;(37ce57bd-d09f-4ac3-9b26-2c0bb1fda101,{(sessionId,234),(udid,996953be-917e-4e8b-9e1c-1a49d54c48d0)})&#60;br /&#62;
(1111111111-22222222,{(sessionId,222),(udid,996953be-917e-4e8b-9e1c-1a49d54c48d0)})&#60;br /&#62;
(20121221-161023.505#996953be-917e-4e8b-9e1c-1a49d54c48d0,{(sessionId,123),(udid,996953be-917e-4e8b-9e1c-1a49d54c48d0)})&#60;/p&#62;
&#60;p&#62;grunt&#38;gt; flattenedInteraction = FOREACH interaction {&#60;br /&#62;
&#38;gt;&#38;gt;          device = FILTER columns by name == 'udid';&#60;br /&#62;
&#38;gt;&#38;gt;          session = FILTER columns by name == 'sessionId';&#60;br /&#62;
&#38;gt;&#38;gt;          GENERATE&#60;br /&#62;
&#38;gt;&#38;gt;              flatten(device.value) as udid,&#60;br /&#62;
&#38;gt;&#38;gt;              flatten(session.value) as sessionId;&#60;br /&#62;
&#38;gt;&#38;gt;  }&#60;br /&#62;
grunt&#38;gt; DUMP flattenedInteraction;&#60;br /&#62;
2013-01-07 15:51:54,089 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN&#60;br /&#62;
2013-01-07 15:51:54,099 [main] INFO  org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned for interaction: $0&#60;br /&#62;
2013-01-07 15:51:54,118 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false&#60;br /&#62;
2013-01-07 15:51:54,120 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1&#60;br /&#62;
2013-01-07 15:51:54,120 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1&#60;br /&#62;
2013-01-07 15:51:54,122 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job&#60;br /&#62;
2013-01-07 15:51:54,123 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3&#60;br /&#62;
2013-01-07 15:51:54,124 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job3897565856870854991.jar&#60;br /&#62;
2013-01-07 15:51:55,719 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job3897565856870854991.jar created&#60;br /&#62;
2013-01-07 15:51:55,726 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job&#60;br /&#62;
2013-01-07 15:51:55,746 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.&#60;br /&#62;
2013-01-07 15:51:55,830 [Thread-6] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 5&#60;br /&#62;
2013-01-07 15:51:56,246 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201212300711_0021&#60;br /&#62;
2013-01-07 15:51:56,247 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: &#60;a href=&#34;http://10.69.12.123:50030/jobdetails.jsp?jobid=job_201212300711_0021&#34; rel=&#34;nofollow&#34;&#62;http://10.69.12.123:50030/jobdetails.jsp?jobid=job_201212300711_0021&#60;/a&#62;&#60;br /&#62;
2013-01-07 15:51:56,248 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete&#60;br /&#62;
2013-01-07 15:52:06,772 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 10% complete&#60;br /&#62;
2013-01-07 15:52:07,774 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 20% complete&#60;br /&#62;
2013-01-07 15:52:09,779 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 30% complete&#60;br /&#62;
2013-01-07 15:52:10,781 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 40% complete&#60;br /&#62;
2013-01-07 15:52:12,786 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete&#60;br /&#62;
2013-01-07 15:52:21,312 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete&#60;br /&#62;
2013-01-07 15:52:21,313 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:&#60;/p&#62;
&#60;p&#62;HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features&#60;br /&#62;
1.0.2-dse-2.2.1 0.9.2   cassadm 2013-01-07 15:51:54     2013-01-07 15:52:21     UNKNOWN&#60;/p&#62;
&#60;p&#62;Success!&#60;/p&#62;
&#60;p&#62;Job Stats (time in seconds):&#60;br /&#62;
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs&#60;br /&#62;
job_201212300711_0021   5       0       6       4       5       0       0       0       flattenedInteraction,interaction        MAP_ONLY        cfs:/tmp/temp-199271653/tmp-1710812623,&#60;/p&#62;
&#60;p&#62;Input(s):&#60;br /&#62;
Successfully read 3 records from: &#34;cassandra://DEVKS/InteractionCF&#34;&#60;/p&#62;
&#60;p&#62;Output(s):&#60;br /&#62;
Successfully stored 3 records in: &#34;cfs:/tmp/temp-199271653/tmp-1710812623&#34;&#60;/p&#62;
&#60;p&#62;Counters:&#60;br /&#62;
Total records written : 3&#60;br /&#62;
Total bytes written : 0&#60;br /&#62;
Spillable Memory Manager spill count : 0&#60;br /&#62;
Total bags proactively spilled: 0&#60;br /&#62;
Total records proactively spilled: 0&#60;/p&#62;
&#60;p&#62;Job DAG:&#60;br /&#62;
job_201212300711_0021&#60;/p&#62;
&#60;p&#62;2013-01-07 15:52:21,319 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!&#60;br /&#62;
2013-01-07 15:52:21,326 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 5&#60;br /&#62;
2013-01-07 15:52:21,326 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 5&#60;/p&#62;
&#60;p&#62;(996953be-917e-4e8b-9e1c-1a49d54c48d0,123)&#60;br /&#62;
(996953be-917e-4e8b-9e1c-1a49d54c48d0,234)&#60;br /&#62;
(996953be-917e-4e8b-9e1c-1a49d54c48d0,222)&#60;/p&#62;
&#60;p&#62;************ Pig job output with secondary index *******************&#60;/p&#62;
&#60;p&#62;grunt&#38;gt; interaction = LOAD 'cassandra://DEVKS/InteractionCF' USING CassandraStorage();&#60;/p&#62;
&#60;p&#62;grunt&#38;gt; DESCRIBE interaction;&#60;br /&#62;
interaction: {key: chararray,udid: (name: chararray,value: chararray),columns: {(name: chararray,value: chararray)}}&#60;/p&#62;
&#60;p&#62;grunt&#38;gt; DUMP interaction;&#60;br /&#62;
2013-01-07 15:56:15,948 [main] WARN  org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable&#60;br /&#62;
2013-01-07 15:56:15,981 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN&#60;br /&#62;
2013-01-07 15:56:16,062 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false&#60;br /&#62;
2013-01-07 15:56:16,091 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1&#60;br /&#62;
2013-01-07 15:56:16,092 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1&#60;br /&#62;
2013-01-07 15:56:16,238 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job&#60;br /&#62;
2013-01-07 15:56:16,251 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3&#60;br /&#62;
2013-01-07 15:56:16,251 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job2101224938111435909.jar&#60;br /&#62;
2013-01-07 15:56:18,003 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job2101224938111435909.jar created&#60;br /&#62;
2013-01-07 15:56:18,019 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job&#60;br /&#62;
2013-01-07 15:56:18,045 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.&#60;br /&#62;
2013-01-07 15:56:18,441 [Thread-5] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 5&#60;br /&#62;
2013-01-07 15:56:18,545 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete&#60;br /&#62;
2013-01-07 15:56:19,083 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201212300711_0022&#60;br /&#62;
2013-01-07 15:56:19,083 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: &#60;a href=&#34;http://10.69.12.123:50030/jobdetails.jsp?jobid=job_201212300711_0022&#34; rel=&#34;nofollow&#34;&#62;http://10.69.12.123:50030/jobdetails.jsp?jobid=job_201212300711_0022&#60;/a&#62;&#60;br /&#62;
2013-01-07 15:56:29,111 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 10% complete&#60;br /&#62;
2013-01-07 15:56:30,113 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 20% complete&#60;br /&#62;
2013-01-07 15:56:32,128 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 30% complete&#60;br /&#62;
2013-01-07 15:56:33,130 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 40% complete&#60;br /&#62;
2013-01-07 15:56:35,135 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete&#60;br /&#62;
2013-01-07 15:56:43,681 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete&#60;br /&#62;
2013-01-07 15:56:43,683 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:&#60;/p&#62;
&#60;p&#62;HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features&#60;br /&#62;
1.0.2-dse-2.2.1 0.9.2   cassadm 2013-01-07 15:56:16     2013-01-07 15:56:43     UNKNOWN&#60;/p&#62;
&#60;p&#62;Success!&#60;/p&#62;
&#60;p&#62;Job Stats (time in seconds):&#60;br /&#62;
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs&#60;br /&#62;
job_201212300711_0022   5       0       6       4       5       0       0       0       interaction     MAP_ONLY        cfs:/tmp/temp-444607507/tmp-1768433647,&#60;/p&#62;
&#60;p&#62;Input(s):&#60;br /&#62;
Successfully read 3 records from: &#34;cassandra://DEVKS/InteractionCF&#34;&#60;/p&#62;
&#60;p&#62;Output(s):&#60;br /&#62;
Successfully stored 3 records in: &#34;cfs:/tmp/temp-444607507/tmp-1768433647&#34;&#60;/p&#62;
&#60;p&#62;Counters:&#60;br /&#62;
Total records written : 3&#60;br /&#62;
Total bytes written : 0&#60;br /&#62;
Spillable Memory Manager spill count : 0&#60;br /&#62;
Total bags proactively spilled: 0&#60;br /&#62;
Total records proactively spilled: 0&#60;/p&#62;
&#60;p&#62;Job DAG:&#60;br /&#62;
job_201212300711_0022&#60;/p&#62;
&#60;p&#62;2013-01-07 15:56:43,690 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!&#60;br /&#62;
2013-01-07 15:56:43,711 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 5&#60;br /&#62;
2013-01-07 15:56:43,711 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 5&#60;/p&#62;
&#60;p&#62;(37ce57bd-d09f-4ac3-9b26-2c0bb1fda101,(udid,996953be-917e-4e8b-9e1c-1a49d54c48d0),{(sessionId,234)})&#60;br /&#62;
(1111111111-22222222,(udid,996953be-917e-4e8b-9e1c-1a49d54c48d0),{(sessionId,222)})&#60;br /&#62;
(20121221-161023.505#996953be-917e-4e8b-9e1c-1a49d54c48d0,(udid,996953be-917e-4e8b-9e1c-1a49d54c48d0),{(sessionId,123)})&#60;/p&#62;
&#60;p&#62;grunt&#38;gt; flattenedInteraction = FOREACH interaction {&#60;br /&#62;
&#38;gt;&#38;gt;          device = FILTER columns by name == 'udid';&#60;br /&#62;
&#38;gt;&#38;gt;          session = FILTER columns by name == 'sessionId';&#60;br /&#62;
&#38;gt;&#38;gt;          GENERATE&#60;br /&#62;
&#38;gt;&#38;gt;              flatten(device.value) as udid,&#60;br /&#62;
&#38;gt;&#38;gt;              flatten(session.value) as sessionId;&#60;br /&#62;
&#38;gt;&#38;gt;  }&#60;br /&#62;
grunt&#38;gt; DUMP flattenedInteraction;&#60;br /&#62;
2013-01-07 15:56:43,820 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN&#60;br /&#62;
2013-01-07 15:56:43,829 [main] INFO  org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned for interaction: $0, $1&#60;br /&#62;
2013-01-07 15:56:43,848 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false&#60;br /&#62;
2013-01-07 15:56:43,850 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1&#60;br /&#62;
2013-01-07 15:56:43,850 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1&#60;br /&#62;
2013-01-07 15:56:43,852 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job&#60;br /&#62;
2013-01-07 15:56:43,853 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3&#60;br /&#62;
2013-01-07 15:56:43,854 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job8930437387737732767.jar&#60;br /&#62;
2013-01-07 15:56:45,421 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job8930437387737732767.jar created&#60;br /&#62;
2013-01-07 15:56:45,428 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job&#60;br /&#62;
2013-01-07 15:56:45,449 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.&#60;br /&#62;
2013-01-07 15:56:45,526 [Thread-6] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 5&#60;br /&#62;
2013-01-07 15:56:45,949 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201212300711_0023&#60;br /&#62;
2013-01-07 15:56:45,950 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: &#60;a href=&#34;http://10.69.12.123:50030/jobdetails.jsp?jobid=job_201212300711_0023&#34; rel=&#34;nofollow&#34;&#62;http://10.69.12.123:50030/jobdetails.jsp?jobid=job_201212300711_0023&#60;/a&#62;&#60;br /&#62;
2013-01-07 15:56:45,951 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete&#60;br /&#62;
2013-01-07 15:56:55,974 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 10% complete&#60;br /&#62;
2013-01-07 15:56:56,976 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 20% complete&#60;br /&#62;
2013-01-07 15:56:58,980 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 30% complete&#60;br /&#62;
2013-01-07 15:56:59,983 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 40% complete&#60;br /&#62;
2013-01-07 15:57:01,988 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete&#60;br /&#62;
2013-01-07 15:57:11,019 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete&#60;br /&#62;
2013-01-07 15:57:11,019 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:&#60;/p&#62;
&#60;p&#62;HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features&#60;br /&#62;
1.0.2-dse-2.2.1 0.9.2   cassadm 2013-01-07 15:56:43     2013-01-07 15:57:11     UNKNOWN&#60;/p&#62;
&#60;p&#62;Success!&#60;/p&#62;
&#60;p&#62;Job Stats (time in seconds):&#60;br /&#62;
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs&#60;br /&#62;
job_201212300711_0023   5       0       6       4       5       0       0       0       flattenedInteraction,interaction        MAP_ONLY        cfs:/tmp/temp-444607507/tmp-292839261,&#60;/p&#62;
&#60;p&#62;Input(s):&#60;br /&#62;
Successfully read 3 records from: &#34;cassandra://DEVKS/InteractionCF&#34;&#60;/p&#62;
&#60;p&#62;Output(s):&#60;br /&#62;
Successfully stored 0 records in: &#34;cfs:/tmp/temp-444607507/tmp-292839261&#34;&#60;/p&#62;
&#60;p&#62;Counters:&#60;br /&#62;
Total records written : 0&#60;br /&#62;
Total bytes written : 0&#60;br /&#62;
Spillable Memory Manager spill count : 0&#60;br /&#62;
Total bags proactively spilled: 0&#60;br /&#62;
Total records proactively spilled: 0&#60;/p&#62;
&#60;p&#62;Job DAG:&#60;br /&#62;
job_201212300711_0023&#60;/p&#62;
&#60;p&#62;2013-01-07 15:57:11,025 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!&#60;br /&#62;
2013-01-07 15:57:11,036 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 5&#60;br /&#62;
2013-01-07 15:57:11,036 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 5
&#60;/p&#62;</description>
		</item>
		<item>
			<title>cko on "Pig unable to connect to Cassandra that requires authentication"</title>
			<link>http://www.datastax.com/support-forums/topic/pig-unable-to-connect-to-cassandra-that-requires-authentication#post-8306</link>
			<pubDate>Mon, 07 Jan 2013 00:39:52 +0000</pubDate>
			<dc:creator>cko</dc:creator>
			<guid isPermaLink="false">8306@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;In &#60;a href=&#34;https://issues.apache.org/jira/browse/CASSANDRA-3042&#34; rel=&#34;nofollow&#34;&#62;https://issues.apache.org/jira/browse/CASSANDRA-3042&#60;/a&#62;, authentication support has already been added to the Pig Load funciton CassandraStorage in Cassandra 1.2.0.&#60;/p&#62;
&#60;p&#62;Currently DSE 2.2.1 includes Cassandra 1.1.6. Is there a schedule when DSE will include Cassandra 1.2.0?&#60;/p&#62;
&#60;p&#62;Is there any plan to add Pig Cassandra authentication support to DSE when it includes Cassandra 1.2.0?
&#60;/p&#62;</description>
		</item>
		<item>
			<title>mbulman on "Opscenter - Cluster -List view not showing the correct size"</title>
			<link>http://www.datastax.com/support-forums/topic/opscenter-cluster-list-view-not-showing-the-correct-size#post-8246</link>
			<pubDate>Mon, 31 Dec 2012 16:43:19 +0000</pubDate>
			<dc:creator>mbulman</dc:creator>
			<guid isPermaLink="false">8246@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;This is most likely caused by a known issue with errors in data collection causing new data from being pushed.  Unfortunately, that also causes errors to not be displayed properly, making it harder to diagnose.  This should be fixed in the next patch version of OpsCenter (2.1.3), which will be out in the near future.  Keep an eye on our dev blog for that announcement.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>cko on "Opscenter - Cluster -List view not showing the correct size"</title>
			<link>http://www.datastax.com/support-forums/topic/opscenter-cluster-list-view-not-showing-the-correct-size#post-8223</link>
			<pubDate>Sun, 30 Dec 2012 23:29:54 +0000</pubDate>
			<dc:creator>cko</dc:creator>
			<guid isPermaLink="false">8223@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Hi, &#60;/p&#62;
&#60;p&#62;We are also seeing the same symptomp in Ops Center. &#60;/p&#62;
&#60;p&#62;After restarting the opscenter agent in one node, the disk usage of that one node shown in OpsCenter is correct again.&#60;/p&#62;
&#60;p&#62;It appears that the agent is not picking up any growth in disk usage since it was started.&#60;/p&#62;
&#60;p&#62;The agent.log does not have any obvious error messages. There are normal messages like this:&#60;br /&#62;
INFO [qtp1925049412-39] 2012-12-31 09:08:44,710 HTTP: :get /os-metric/disk-space  - 200&#60;/p&#62;
&#60;p&#62;We are using DSE 2.1.1, OpsCenter 2.1.2&#60;/p&#62;
&#60;p&#62;Chin
&#60;/p&#62;</description>
		</item>
		<item>
			<title>srk_user on "Opscenter - Cluster -List view not showing the correct size"</title>
			<link>http://www.datastax.com/support-forums/topic/opscenter-cluster-list-view-not-showing-the-correct-size#post-8207</link>
			<pubDate>Sat, 29 Dec 2012 13:36:31 +0000</pubDate>
			<dc:creator>srk_user</dc:creator>
			<guid isPermaLink="false">8207@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;In OpsCenter - Cluster -List view - the data size is not showing up correctly. The OS Disk Usage for the Cassandra Data File system shows 901GB but on the List view it shows up as only 548GB. Why is the Opscenter List View not showing the correct file size?
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Sven on "Pig unable to connect to Cassandra that requires authentication"</title>
			<link>http://www.datastax.com/support-forums/topic/pig-unable-to-connect-to-cassandra-that-requires-authentication#post-8167</link>
			<pubDate>Thu, 27 Dec 2012 21:57:51 +0000</pubDate>
			<dc:creator>Sven</dc:creator>
			<guid isPermaLink="false">8167@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;SimpleAuthenticator and SimpleAuthority are not currently supported by the DSE components. We are investigating the possibility of adding support for enhanced security in a future release of DSE.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>cko on "ETL Tools to transfer data from Cassandra into other relational databases"</title>
			<link>http://www.datastax.com/support-forums/topic/etl-tools-to-transfer-data-from-cassandra-into-other-relational-databases#post-8153</link>
			<pubDate>Thu, 27 Dec 2012 06:46:11 +0000</pubDate>
			<dc:creator>cko</dc:creator>
			<guid isPermaLink="false">8153@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Hi Srini,&#60;/p&#62;
&#60;p&#62;Thanks for your response.&#60;/p&#62;
&#60;p&#62;A few months ago, we did try Pentaho briefly. The job caused an OutOfMemoryError in Cassandra. The Cassandra Pentaho plugin does not handle retrieving rows in batches. It was mentioned that the next version of the plugin will address that issue. We will look at it again when it is available.&#60;/p&#62;
&#60;p&#62;Thanks again.&#60;br /&#62;
Chin
&#60;/p&#62;</description>
		</item>
		<item>
			<title>cko on "Pig unable to connect to Cassandra that requires authentication"</title>
			<link>http://www.datastax.com/support-forums/topic/pig-unable-to-connect-to-cassandra-that-requires-authentication#post-8152</link>
			<pubDate>Thu, 27 Dec 2012 06:35:26 +0000</pubDate>
			<dc:creator>cko</dc:creator>
			<guid isPermaLink="false">8152@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Hi,&#60;/p&#62;
&#60;p&#62;I am trying Cassandra authentication using the sample SimpleAuthenticator and SimpleAuthority classes and also trying securing the Cassandra JMX port. Ops Center can successfully connect to and monitor a Cassandra cluster with the authentication enabled. &#60;/p&#62;
&#60;p&#62;However there are problems running Pig against a Cassandra cluster with authentication enabled. &#60;/p&#62;
&#60;p&#62;Looking at the Pig stack trace, we found that the SimpleAuthenticator was throwing an exception during initialisation, so we added the required JVM -D system properties to both scripts $DSE_HOME/bin/dsetool and $DSE_HOME/resources/pig/bin/pig.&#60;/p&#62;
&#60;p&#62;JVM_OPTS=&#34;$JVM_OPTS -Dpasswd.properties=$CASSANDRA_HOME/conf/passwd.properties&#34;&#60;br /&#62;
JVM_OPTS=&#34;$JVM_OPTS -Daccess.properties=$CASSANDRA_HOME/conf/access.properties&#34;&#60;br /&#62;
JVM_OPTS=&#34;$JVM_OPTS -Dpasswd.mode=MD5&#34;&#60;/p&#62;
&#60;p&#62;That takes us further, but then it appears that the CFS custom storage handler received an exception from Cassandra when it tries to connect to the Cassandra database without supplying credentials.&#60;/p&#62;
&#60;p&#62;Error before Pig is launched&#60;br /&#62;
----------------------------&#60;br /&#62;
ERROR 2999: Unexpected internal error. Failed to create DataStorage&#60;/p&#62;
&#60;p&#62;java.lang.RuntimeException: Failed to create DataStorage&#60;br /&#62;
        at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)&#60;br /&#62;
        at org.apache.pig.backend.hadoop.datastorage.HDataStorage.&#38;lt;init&#38;gt;(HDataStorage.java:58)&#60;br /&#62;
        at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:206)&#60;br /&#62;
        at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:119)&#60;br /&#62;
        at org.apache.pig.impl.PigContext.connect(PigContext.java:185)&#60;br /&#62;
        at org.apache.pig.PigServer.&#38;lt;init&#38;gt;(PigServer.java:244)&#60;br /&#62;
        at org.apache.pig.PigServer.&#38;lt;init&#38;gt;(PigServer.java:229)&#60;br /&#62;
        at org.apache.pig.tools.grunt.Grunt.&#38;lt;init&#38;gt;(Grunt.java:47)&#60;br /&#62;
        at org.apache.pig.Main.run(Main.java:492)&#60;br /&#62;
        at org.apache.pig.Main.main(Main.java:111)&#60;br /&#62;
Caused by: java.io.IOException: InvalidRequestException(why:You have not logged in)&#60;br /&#62;
        at com.datastax.bdp.util.CassandraProxyClient.initialize(CassandraProxyClient.java:227)&#60;br /&#62;
        at com.datastax.bdp.util.CassandraProxyClient.&#38;lt;init&#38;gt;(CassandraProxyClient.java:180)&#60;br /&#62;
        at com.datastax.bdp.util.CassandraProxyClient.newProxyConnection(CassandraProxyClient.java:119)&#60;br /&#62;
        at com.datastax.bdp.hadoop.cfs.CassandraFileSystemThriftStore.initialize(CassandraFileSystemThriftStore.java:195)&#60;br /&#62;
        at com.datastax.bdp.hadoop.cfs.CassandraFileSystem.initialize(CassandraFileSystem.java:67)&#60;br /&#62;
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)&#60;br /&#62;
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)&#60;br /&#62;
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)&#60;br /&#62;
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)&#60;br /&#62;
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)&#60;br /&#62;
        at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)&#60;br /&#62;
        ... 9 more&#60;br /&#62;
Caused by: InvalidRequestException(why:You have not logged in)&#60;br /&#62;
        at org.apache.cassandra.thrift.Cassandra$describe_keyspaces_result.read(Cassandra.java:22379)&#60;br /&#62;
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)&#60;br /&#62;
        at org.apache.cassandra.thrift.Cassandra$Client.recv_describe_keyspaces(Cassandra.java:1004)&#60;br /&#62;
        at org.apache.cassandra.thrift.Cassandra$Client.describe_keyspaces(Cassandra.java:992)&#60;br /&#62;
        at com.datastax.bdp.util.CassandraProxyClient.initialize(CassandraProxyClient.java:213)&#60;br /&#62;
        ... 19 more&#60;/p&#62;
&#60;p&#62;I could not find anything in Datastax documentation on how to supply credentials in Pig to connect to Cassandra. Does the current Cassandra storage handler support connecting to a secured Cassandra database? We are using DSE 2.2.1.&#60;/p&#62;
&#60;p&#62;Thanks in advance.&#60;/p&#62;
&#60;p&#62;Chin
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Srini on "ETL Tools to transfer data from Cassandra into other relational databases"</title>
			<link>http://www.datastax.com/support-forums/topic/etl-tools-to-transfer-data-from-cassandra-into-other-relational-databases#post-8015</link>
			<pubDate>Tue, 18 Dec 2012 18:09:37 +0000</pubDate>
			<dc:creator>Srini</dc:creator>
			<guid isPermaLink="false">8015@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;This should help you. It's just an overview. Drill down more as per your requirement.&#60;/p&#62;
&#60;p&#62;&#60;a href=&#34;http://ovum.com/2012/03/08/pentaho-expands-big-data-coverage/&#34; rel=&#34;nofollow&#34;&#62;http://ovum.com/2012/03/08/pentaho-expands-big-data-coverage/&#60;/a&#62;&#60;/p&#62;
&#60;p&#62;&#60;a href=&#34;http://wiki.pentaho.com/display/DATAMINING/Cassandra+Source+and+Sink+in+Weka&#34; rel=&#34;nofollow&#34;&#62;http://wiki.pentaho.com/display/DATAMINING/Cassandra+Source+and+Sink+in+Weka&#60;/a&#62;
&#60;/p&#62;</description>
		</item>
		<item>
			<title>cko on "ETL Tools to transfer data from Cassandra into other relational databases"</title>
			<link>http://www.datastax.com/support-forums/topic/etl-tools-to-transfer-data-from-cassandra-into-other-relational-databases#post-7972</link>
			<pubDate>Fri, 14 Dec 2012 03:13:03 +0000</pubDate>
			<dc:creator>cko</dc:creator>
			<guid isPermaLink="false">7972@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;We will use Cassandra as logging storage in one of our web application. The application only insert rows into Cassandra but never update or delete any rows. The CF is expected to grow by about 0.5 million rows per day.&#60;/p&#62;
&#60;p&#62;We need to transfer the data in Cassandra to another relational database daily. Due to the large size of the CF, instead of truncating the relational table and reloading all rows into it each time, we plan to run a job to select the &#34;delta&#34; rows since the last run and insert them into the relational database.&#60;/p&#62;
&#60;p&#62;We know we can use Java, Pig or Hive to extract the delta rows to a flat file and load the data into the target relational table. We are particularly interested in a process that can extract delta rows without scanning the entire CF.&#60;/p&#62;
&#60;p&#62;Has anyone used any other ETL tools to do this kind of delta extraction from Cassandra? We appreciate any comments and experience.&#60;/p&#62;
&#60;p&#62;Thanks,&#60;br /&#62;
Chin
&#60;/p&#62;</description>
		</item>
		<item>
			<title>nickmbailey on "Selecting rows efficiently from a Cassandra CF containing time series data"</title>
			<link>http://www.datastax.com/support-forums/topic/selecting-rows-efficiently-from-a-cassandra-cf-containing-time-series-data#post-7947</link>
			<pubDate>Tue, 11 Dec 2012 16:30:53 +0000</pubDate>
			<dc:creator>nickmbailey</dc:creator>
			<guid isPermaLink="false">7947@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;I would recommend reading:&#60;/p&#62;
&#60;p&#62;&#60;a href=&#34;http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra&#34; rel=&#34;nofollow&#34;&#62;http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra&#60;/a&#62;&#60;/p&#62;
&#60;p&#62;You likely will want to restructure your data model so that events are stored in a row ordered by time. You could then partition this row by some time value to get better distribution.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>nickmbailey on "Selecting rows efficiently from a Cassandra CF containing time series data"</title>
			<link>http://www.datastax.com/support-forums/topic/selecting-rows-efficiently-from-a-cassandra-cf-containing-time-series-data#post-7946</link>
			<pubDate>Tue, 11 Dec 2012 16:30:52 +0000</pubDate>
			<dc:creator>nickmbailey</dc:creator>
			<guid isPermaLink="false">7946@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;I would recommend reading:&#60;/p&#62;
&#60;p&#62;&#60;a href=&#34;http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra&#34; rel=&#34;nofollow&#34;&#62;http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra&#60;/a&#62;&#60;/p&#62;
&#60;p&#62;You likely will want to restructure your data model so that events are stored in a row ordered by time. You could then partition this row by some time value to get better distribution.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>cko on "Selecting rows efficiently from a Cassandra CF containing time series data"</title>
			<link>http://www.datastax.com/support-forums/topic/selecting-rows-efficiently-from-a-cassandra-cf-containing-time-series-data#post-7940</link>
			<pubDate>Tue, 11 Dec 2012 13:50:51 +0000</pubDate>
			<dc:creator>cko</dc:creator>
			<guid isPermaLink="false">7940@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;I would like to get some opinions on how to select an incremental range of rows efficiently from a Cassandra CF containing time series data.&#60;/p&#62;
&#60;p&#62;Background:&#60;br /&#62;
We have a web application that uses a Cassandra CF as logging storage. We insert a row into the CF for every &#34;event&#34; of each user of the web application. The row key is timestamp+userid. The column values are unstructured data. We only insert rows but never update or delete any rows in the CF. &#60;/p&#62;
&#60;p&#62;Data volume:&#60;br /&#62;
The CF grows by about 0.5 million rows per day. We have a 4 node cluster and use the RandomPartitioner to spread the rows across the nodes.&#60;/p&#62;
&#60;p&#62;Requirements:&#60;br /&#62;
There is a need to transfer the Cassandra data to another relational database periodically. Due to the large size of the CF, instead of truncating the relational table and reloading all rows into it each time, we plan to run a job to select the &#34;delta&#34; rows since the last run and insert them into the relational database.&#60;/p&#62;
&#60;p&#62;We would like to have some flexibility in how often the data transfer job is done. It may be run several times each day, or it may be not run at all on a day.&#60;/p&#62;
&#60;p&#62;Options considered:&#60;br /&#62;
- We are using RandomPartitioner, so range scan by row key is not feasible.&#60;br /&#62;
- Add a secondary index on the timestamp column, but reading rows via secondary index still requires an equality condition and does not support range scan.&#60;br /&#62;
- Add a secondary index on a column containing the date and hour of the timestamp. Iterate each hour between the time job was last run and now. Fetch all rows of each hour.&#60;/p&#62;
&#60;p&#62;I would appreciate any ideas of other design options of the Cassandra CF to enable extracting the rows efficiently.&#60;/p&#62;
&#60;p&#62;Besides Java, has anyone used any ETL tools to do this kind of delta extraction from Cassandra?
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
