<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>DataStax Support Forums &#187; Topic: brisk does not appear to use snappy &#34;by default&#34;</title>
		<link>http://www.datastax.com/support-forums/topic/brisk-does-not-appear-to-use-snappy-by-default</link>
		<description>Software, Support, and Training for Apache Cassandra</description>
		<language>en-US</language>
		<pubDate>Wed, 22 May 2013 15:56:57 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.3</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://www.datastax.com/support-forums/search.php</link>
		</textInput>
		<atom:link href="http://www.datastax.com/support-forums/rss/topic/brisk-does-not-appear-to-use-snappy-by-default" rel="self" type="application/rss+xml" />

		<item>
			<title>Anonymous on "brisk does not appear to use snappy &#34;by default&#34;"</title>
			<link>http://www.datastax.com/support-forums/topic/brisk-does-not-appear-to-use-snappy-by-default#post-767</link>
			<pubDate>Thu, 24 Nov 2011 19:39:05 +0000</pubDate>
			<dc:creator>Anonymous</dc:creator>
			<guid isPermaLink="false">767@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;I've been a member of &#60;a href=&#34;http://www.nakliyatankara.com&#34; title=&#34;ankara nakliyat&#34;&#62;ankara nakliyat&#60;/a&#62;  the new forum site, I found your site by chance while visiting the internet site I believe is a very useful forum to &#60;a href=&#34;http://www.nakliyatankara.com&#34; title=&#34;evden eve nakliyat ankara&#34;&#62;evden eve nakliyat ankara&#60;/a&#62;  share useful information for all of us will benefit all ankara evden eve nakliyat şirketi hakkında genel bilgiler ve ankara nakliyat hakkında genel görüşlerin yer almış oldugu kaliteli evden eve nakliyat sitesine ait net portal bilgileri.those who labor to the forum site function &#60;a href=&#34;http://www.nakliyatankara.com&#34; title=&#34;ankara evden eve nakliyat&#34;&#62;ankara evden eve nakliyat&#60;/a&#62; family.&#60;br /&#62;
Mary Lou King
&#60;/p&#62;</description>
		</item>
		<item>
			<title>SVBridget on "brisk does not appear to use snappy &#34;by default&#34;"</title>
			<link>http://www.datastax.com/support-forums/topic/brisk-does-not-appear-to-use-snappy-by-default#post-442</link>
			<pubDate>Mon, 15 Aug 2011 19:04:18 +0000</pubDate>
			<dc:creator>SVBridget</dc:creator>
			<guid isPermaLink="false">442@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Follow-up from the IRC...&#60;/p&#62;
&#60;p&#62;&#38;lt;@tjake&#38;gt; your last post is correct&#60;br /&#62;
&#38;lt;@tjake&#38;gt; hadoop reports 8MB but on disk it's 2MB&#60;br /&#62;
&#38;lt;SVBridget&#38;gt; is there a way to ask hadoop how much space it is taking up on disk?&#60;br /&#62;
&#38;lt;@tjake&#38;gt; no but you can ask cassandra :)&#60;br /&#62;
&#38;lt;SVBridget&#38;gt; it's weird that for D it doesn't say 8MB also...&#60;br /&#62;
&#38;lt;@tjake&#38;gt; well in case of D hadoop knows its being compressed&#60;br /&#62;
&#38;lt;@tjake&#38;gt; in CFS we are compressing below the hadoop api&#60;br /&#62;
&#38;lt;@tjake&#38;gt; so it doesn't know&#60;br /&#62;
&#38;lt;SVBridget&#38;gt; ah ok&#60;br /&#62;
&#38;lt;SVBridget&#38;gt; that makes sense&#60;br /&#62;
&#38;lt;SVBridget&#38;gt; tx for clarifying, we'll take a look at what cassandra is saying for these files&#60;br /&#62;
&#38;lt;SVBridget&#38;gt; but basically there is  no point in pre-compressing any files we are going to put into cassandra... because they're going to be stored block/snappy compressed anyway&#60;br /&#62;
&#38;lt;SVBridget&#38;gt; so it's pointless to, say, snappy compress, then put into cfs, right?&#60;br /&#62;
&#38;lt;@tjake&#38;gt; SVBridget: right&#60;br /&#62;
&#38;lt;@tjake&#38;gt; it does help to compress map output&#60;br /&#62;
&#38;lt;@tjake&#38;gt; the nice thing about the built-in snappy compression is its done task side&#60;br /&#62;
&#38;lt;@tjake&#38;gt; so lowers network io&#60;br /&#62;
&#38;lt;jeromatron&#38;gt; tjake: cool, so intermediate results are more efficiently compressed.&#60;br /&#62;
&#38;lt;@tjake&#38;gt; jeromatron: yes if you enable map output compression using snappy codec&#60;br /&#62;
&#38;lt;@tjake&#38;gt; the map output goes to hadoop.tmp which is on local disk&#60;br /&#62;
&#38;lt;jeromatron&#38;gt; sweet corn.  yep - pretty cool.  and you don't have to use lzo :)
&#60;/p&#62;</description>
		</item>
		<item>
			<title>SVBridget on "brisk does not appear to use snappy &#34;by default&#34;"</title>
			<link>http://www.datastax.com/support-forums/topic/brisk-does-not-appear-to-use-snappy-by-default#post-434</link>
			<pubDate>Fri, 12 Aug 2011 17:15:52 +0000</pubDate>
			<dc:creator>SVBridget</dc:creator>
			<guid isPermaLink="false">434@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Here's our scenario:&#60;br /&#62;
* A: We have a text file that is 8 MB (as reported by linux, e.g. before it is in hadoop).&#60;br /&#62;
* B: When we do a brisk hadoop put of this file, then get the file size from hadoop, it's also 8 MB&#60;br /&#62;
* C: We then made a hive table that points to the raw file that we put into hadoop, and then we selected out of that hive table and into a new hive table, and then we did a brisk hadoop fs -ls on the underlying data file for that 2nd hive table, and it was 8 MB&#60;br /&#62;
* D: Finally, we did basically the same thing, except we set hive properties to specify that we should do compression (that's the three SET statements in my original post).  Then we did a brisk hadoop fs -ls on the underlying file for the 2nd hive table, and it's 2 MB&#60;/p&#62;
&#60;p&#62;So this is basically what led to my questions... this seems pretty confusing.  If CFS is compressing &#34;automatically&#34;, I would expect the size of the underlying data file for case &#34;C&#34; to be the same as for &#34;D&#34;.  It seems like you are saying that hadoop is going to report the uncompressed size, and if that is the case, shouldn't &#34;C&#34; and &#34;D&#34; both report 8 MB?  &#60;/p&#62;
&#60;p&#62;Sorry if it seems like we're going into a rat hole on this, but we really would like to understand how this works, so we can make sure we are compressing properly...&#60;/p&#62;
&#60;p&#62;And at a nuts and bolts level, our question is, do we need to specify any compression parameters to hive (e.g. the three SET statements in my original post), OR since CFS snappy compresses &#34;everything&#34;, we can simply skip any hive compression settings, and our data that we put in any hive table will be compressed.&#60;/p&#62;
&#60;p&#62;Thanks!&#60;br /&#62;
-B
&#60;/p&#62;</description>
		</item>
		<item>
			<title>tjake on "brisk does not appear to use snappy &#34;by default&#34;"</title>
			<link>http://www.datastax.com/support-forums/topic/brisk-does-not-appear-to-use-snappy-by-default#post-430</link>
			<pubDate>Thu, 11 Aug 2011 23:56:32 +0000</pubDate>
			<dc:creator>tjake</dc:creator>
			<guid isPermaLink="false">430@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;When you say file is the same size you mean as reported by hadoop ls ?&#60;br /&#62;
Internally on disk the dfs blocks are compressed but hadoop thinks it's a regular uncompressed block
&#60;/p&#62;</description>
		</item>
		<item>
			<title>SVBridget on "brisk does not appear to use snappy &#34;by default&#34;"</title>
			<link>http://www.datastax.com/support-forums/topic/brisk-does-not-appear-to-use-snappy-by-default#post-429</link>
			<pubDate>Thu, 11 Aug 2011 23:13:32 +0000</pubDate>
			<dc:creator>SVBridget</dc:creator>
			<guid isPermaLink="false">429@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Hello-&#60;br /&#62;
The documentation for brisk beta-2 says &#34;BRISK-207: New Snappy Compression Codec built on Google Snappy is now used internally for automatic CassandraFS block compression.&#34;&#60;/p&#62;
&#60;p&#62;It appears that Snappy is never used &#34;automatically&#34; by CFS though.  However, if we do a file put, the file remains the same size in cfs.  I would expect to see a smaller file than the original file size.  Similarly, for a hive table that is not using the cassandrastoragehandler (but is using cfs to store its data, since cfs backs hive in brisk) - we're also seeing that the data is not compressed automatically, either.  We have to set some hive parameters to do this:&#60;br /&#62;
SET hive.exec.compress.output=true;&#60;br /&#62;
SET mapred.output.compression.codec=com.hadoop.compression.snappy.SnappyCodec;&#60;br /&#62;
SET mapred.output.compression.type=BLOCK;&#60;/p&#62;
&#60;p&#62;I'm not sure if we are interpreting the word &#34;automatic&#34; incorrectly in the description of BRISK-207, or if we're doing something wrong, or there's a bug.  I guess I was expecting that snappy would be compressing *everything* we put into cfs, no matter what route we take.  Is there a way to make cfs behave this way all the time?  Or is this not recommended?&#60;/p&#62;
&#60;p&#62;Any more guidance you can provide would be appreciated...&#60;/p&#62;
&#60;p&#62;Thanks!&#60;br /&#62;
-B
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
