Hi everybody,
I have installed DSE 1.0.1 on 3 nodes cluster to test a pig map/reduce job.
I want to count columns for each row of a ColumnFamily with this script :
rows = LOAD 'cassandra://<keyspace>/<columnfamily>' USING CassandraStorage();
counted = FOREACH rows GENERATE $0,COUNT_STAR($1);
STORE counted INTO '/tmp/column-count-result' USING PigStorage();
It works fine, but there is a default limit of column number loaded (1024)
So I replace my LOAD block by this :
rows = LOAD 'cassandra://<keyspace>/<columnfamily>?limit=2147483647' USING CassandraStorage();
with 2147483647 = Integer.MAX_VALUE = max column number in a row of a ColumnFamily.
Unfortunately, it appears some TimedOutException and OutOfMemoryError: Java heap space in Cassandra logs, and the job doesn't succeed.
So I try to play with the config like indicated here :
http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
But the result is the same.
Someboby have an idea on how configure DSE in order to count a large number of column ?
Thanks
