We have all our data files stored in S3 lzo compressed. Our current hadoop/hive cluster uses and produces this type of files.
We now in the process evaluating whether brisk/cassandra can work with out existing data. I created an external table pointing to S3
create external table exp_web_xxx ( request STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '9' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 's3n://......';
to see if brisk/hive can get the data alright. We are using demo AMI for our brisk cluster. I changed the hadoop configuration according to
Though the demo AMI has a different path setup than the production brisk image, I believe I have changed brisk-env.sh and hadoop's mapred-site.xml accordingly. I also changed brisk/resources/hive/conf/hive-site.xml for lzo compression. However, I can't get hive to work with our lzo compressed files. When I performed some basic operation like
select * from exp_web_xxx limit 10 the output is not decompressed.
Does anyone has experience running brisk/cassandra with lzo compressed files as input data?