I'm working on a project where we have stored a large amount of data within a column family. Upwards of 3billion columns in thousands of rows. When attempting to extract this data and drop it in a file to Amazon S3, we've tried typical big data tools like Hive and Pig. However when accessing the data in CFS or CassandraStorage, it's consistently running into timeout exceptions.
I'm interest in knowing if anyone has come across this type of problem what are some optimization possibilities that may resolve the issue or if there are additional tools to consider that may be better suited for this task.