While using DSE 2.0, how can we get hadoop/hive to increase the number of map tasks for maximum performance? We are trying to run map reduce with the input data coming from cassandra. Are there things like cassandra input buffer size or blocks that map reduce can see and increase the number of map tasks?
In an effort to consolidate free help offered for our products we have decided to move these forums to a more widely used forum. Please use one of the following queries (or any combination):
- Cassandra: tag search or plain text search
- DataStax Enterprise: tag search or plain text search
- DataStax OpsCenter: tag search or plain text search
Number of Map Tasks(6 posts) (2 voices)
Yes, The setting is "cassandra.input.split.size" its default is 64k rows
Thanks. Where do we set this parameter? In mapred-site.xml?
You can set it there or as part of the job using -D in hadoop cmd
Is it the same as setting HIVE_OPT while invoking hive? Or should we make it part of TBLPROPERTIES while defining an external table?
we are trying to increase the number of map tasks that get kicked of as a result of a simple query like
select count(*) from tablefoo;
Make it part of the TBLPROPERTIES
The smaller the split size is the more mappers it will create.