Hello-
The documentation for brisk beta-2 says "BRISK-207: New Snappy Compression Codec built on Google Snappy is now used internally for automatic CassandraFS block compression."
It appears that Snappy is never used "automatically" by CFS though. However, if we do a file put, the file remains the same size in cfs. I would expect to see a smaller file than the original file size. Similarly, for a hive table that is not using the cassandrastoragehandler (but is using cfs to store its data, since cfs backs hive in brisk) - we're also seeing that the data is not compressed automatically, either. We have to set some hive parameters to do this:
SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=com.hadoop.compression.snappy.SnappyCodec;
SET mapred.output.compression.type=BLOCK;
I'm not sure if we are interpreting the word "automatic" incorrectly in the description of BRISK-207, or if we're doing something wrong, or there's a bug. I guess I was expecting that snappy would be compressing *everything* we put into cfs, no matter what route we take. Is there a way to make cfs behave this way all the time? Or is this not recommended?
Any more guidance you can provide would be appreciated...
Thanks!
-B
