How to tell hadoop how much memory to allocate to a single mapper job?
I've created a Elastic MapReduce job, and I'm trying to optimize its performance.
At this moment I'm trying to increase the number of mappers per instance. I am 开发者_运维问答doing this via mapred.tasktracker.map.tasks.maximum=X
elastic-mapreduce --create --alive --num-instance 3 \
--bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop \
--args -s,mapred.tasktracker.map.tasks.maximum=5
Each time I try to set X over 2 per small instance, the initialization fails, from which I conclude, that hadoop allocated 800m of memory per map task. To me that seems too excessive. I'd like it to be 400m tops.
How do I tell hadoop to use less memory for each map task?
Check the mapred.child.java.opts property. It's defaulted to -Xmx200m, which means 200MB of heap for each of the map/reduce task.
Looks like EC2 small has 1.7 GB memory. Here is the memory with the default settings by the Hadoop processes on the TaskTracker node. Thanks to "Hadoop : The Definitive Guide"
Datanode 1,000 MB
Tasktracker 1,000 MB
Tasktracker child map task 400 MB (2 * 200 MB)
Tasktracker child map task 400 MB (2 * 200 MB)
Total's to 2,800MB.
On top of this, there is the OS memory. Either pickup a nicer configuration or change the default settings. FYI, here is the recommendation on the H/W configuration for the different nodes.
精彩评论