While using DSE 2.0, how can we get hadoop/hive to increase the number of map tasks for maximum performance? We are trying to run map reduce with the input data coming from cassandra. Are there things like cassandra input buffer size or blocks that map reduce can see and increase the number of map tasks?
Number of Map Tasks(6 posts) (2 voices)
Yes, The setting is "cassandra.input.split.size" its default is 64k rows
Thanks. Where do we set this parameter? In mapred-site.xml?
You can set it there or as part of the job using -D in hadoop cmd
Is it the same as setting HIVE_OPT while invoking hive? Or should we make it part of TBLPROPERTIES while defining an external table?
we are trying to increase the number of map tasks that get kicked of as a result of a simple query like
select count(*) from tablefoo;
Make it part of the TBLPROPERTIES
The smaller the split size is the more mappers it will create.
You must log in to post.