<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>DataStax Support Forums &#187; Topic: Read huge volume of data from Cassandra</title>
		<link>http://www.datastax.com/support-forums/topic/read-huge-volume-of-data-from-cassandra</link>
		<description>Software, Support, and Training for Apache Cassandra</description>
		<language>en-US</language>
		<pubDate>Tue, 21 May 2013 09:25:29 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.3</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://www.datastax.com/support-forums/search.php</link>
		</textInput>
		<atom:link href="http://www.datastax.com/support-forums/rss/topic/read-huge-volume-of-data-from-cassandra" rel="self" type="application/rss+xml" />

		<item>
			<title>alexliu on "Read huge volume of data from Cassandra"</title>
			<link>http://www.datastax.com/support-forums/topic/read-huge-volume-of-data-from-cassandra#post-7444</link>
			<pubDate>Thu, 08 Nov 2012 18:06:52 +0000</pubDate>
			<dc:creator>alexliu</dc:creator>
			<guid isPermaLink="false">7444@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Cassandra-JDBC is different approach than CQL. CQL is preferred than Cassandra-JDBC if you can get CQL use for your need.&#60;/p&#62;
&#60;p&#62;You can use any other cassandra client to parallel reading from your client side. Cassandra is built for high read and write throughput for a distributed system.&#60;/p&#62;
&#60;p&#62;Hadoop mapredue do a parallel reading or writing also work for some case.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Pearly on "Read huge volume of data from Cassandra"</title>
			<link>http://www.datastax.com/support-forums/topic/read-huge-volume-of-data-from-cassandra#post-7434</link>
			<pubDate>Wed, 07 Nov 2012 22:18:31 +0000</pubDate>
			<dc:creator>Pearly</dc:creator>
			<guid isPermaLink="false">7434@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;As per the documentation, sstable2json converts the on-disk SSTable representation of a column family into a JSON formatted document. But I don't think we can add a filter condition to this. {like get all  rows where columnA='something1' and columnB='something2'}. Appreciate your confirmation here. &#60;/p&#62;
&#60;p&#62;If the above doesn't help, my next option is to use &#34;cassandra-jdbc&#34;. In your earlier response, you didn't seem to mention anything on cassandra-jdbc, but only on the standalone &#34;CQL&#34; query. Appreciate if you can confirm the usage of CQL with the Cassandra-JDBC? Is it a standard approach?&#60;br /&#62;
Also I hope we need to Secondary indexes required if we need to perform the CQL query. &#60;/p&#62;
&#60;p&#62;Other than the above solutions I mentioned, are there are any Standard approach If I want to read high volume of data from a Column Family based on some filter conditions? &#60;/p&#62;
&#60;p&#62;Appreciate you help in these,
&#60;/p&#62;</description>
		</item>
		<item>
			<title>alexliu on "Read huge volume of data from Cassandra"</title>
			<link>http://www.datastax.com/support-forums/topic/read-huge-volume-of-data-from-cassandra#post-6159</link>
			<pubDate>Mon, 27 Aug 2012 08:04:36 +0000</pubDate>
			<dc:creator>alexliu</dc:creator>
			<guid isPermaLink="false">6159@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;CQL is preferred for cassandra query. You can parallel read multiple CFs with load balance cross the cassandra ring. &#60;/p&#62;
&#60;p&#62;There is another tool sstable2json for bulk reading&#60;/p&#62;
&#60;p&#62;&#60;a href=&#34;http://www.datastax.com/docs/1.0/references/sstable2json&#34; rel=&#34;nofollow&#34;&#62;http://www.datastax.com/docs/1.0/references/sstable2json&#60;/a&#62;&#60;/p&#62;
&#60;p&#62;Cassandra comes hadroop integration including hive and pig, it implements hadoop API. You shouldn't worry about how the data is read.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Pearly on "Read huge volume of data from Cassandra"</title>
			<link>http://www.datastax.com/support-forums/topic/read-huge-volume-of-data-from-cassandra#post-6111</link>
			<pubDate>Mon, 20 Aug 2012 23:49:35 +0000</pubDate>
			<dc:creator>Pearly</dc:creator>
			<guid isPermaLink="false">6111@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;What is the preferred/best approach if I want to read huge volume of data from Cassandra for reporting purpose? Any difference in approach if the read is from one CF Vs multiple CFs. (my current need is from one CF)&#60;/p&#62;
&#60;p&#62;Is it optimal to use, cassandra-jdbc and CQL with &#34;select x,y,z…..where /some/ condition&#34;? Or does this problem brings up the need of Hive or PIG? &#60;/p&#62;
&#60;p&#62;To give a short description of my current environment, I have 2 DCs with 3 nodes on each DC. with nodes started with -s (for solar)
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
