While using DSE 2.0, how can we get hadoop/hive to increase the number of map tasks for maximum performance? We are trying to run map reduce with the input data coming from cassandra. Are there things like cassandra input buffer size or blocks that map reduce can see and increase the number of map tasks?
Number of Map Tasks
(6 posts) (2 voices)-
Posted 1 year ago #
-
Yes, The setting is "cassandra.input.split.size" its default is 64k rows
Posted 1 year ago # -
Thanks. Where do we set this parameter? In mapred-site.xml?
Posted 1 year ago # -
You can set it there or as part of the job using -D in hadoop cmd
Posted 1 year ago # -
Is it the same as setting HIVE_OPT while invoking hive? Or should we make it part of TBLPROPERTIES while defining an external table?
export HIVE_OPT="-Dcassandra.input.split.size=268435456"we are trying to increase the number of map tasks that get kicked of as a result of a simple query like
select count(*) from tablefoo;Posted 1 year ago # -
Make it part of the TBLPROPERTIES
The smaller the split size is the more mappers it will create.
Posted 1 year ago #
Reply
You must log in to post.
