We have all our data files stored in S3 lzo compressed. Our current hadoop/hive cluster uses and produces this type of files.
We now in the process evaluating whether brisk/cassandra can work with out existing data. I created an external table pointing to S3
create external table exp_web_xxx
(
request STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '9'
LINES TERMINATED BY '10'
STORED AS TEXTFILE
LOCATION 's3n://......';
to see if brisk/hive can get the data alright. We are using demo AMI for our brisk cluster. I changed the hadoop configuration according to
https://github.com/riptano/brisk/wiki/Installing-LZO-compression
Though the demo AMI has a different path setup than the production brisk image, I believe I have changed brisk-env.sh and hadoop's mapred-site.xml accordingly. I also changed brisk/resources/hive/conf/hive-site.xml for lzo compression. However, I can't get hive to work with our lzo compressed files. When I performed some basic operation like select * from exp_web_xxx limit 10 the output is not decompressed.
Does anyone has experience running brisk/cassandra with lzo compressed files as input data?
