开发者

How to tell hadoop how much memory to allocate to a single mapper job?

I've created a Elastic MapReduce job, and I'm trying to optimize its performance.

At this moment I'm trying to increase the number of mappers per instance. I am 开发者_运维问答doing this via mapred.tasktracker.map.tasks.maximum=X

elastic-mapreduce --create --alive --num-instance 3 \
 --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop \
 --args -s,mapred.tasktracker.map.tasks.maximum=5

Each time I try to set X over 2 per small instance, the initialization fails, from which I conclude, that hadoop allocated 800m of memory per map task. To me that seems too excessive. I'd like it to be 400m tops.

How do I tell hadoop to use less memory for each map task?


Check the mapred.child.java.opts property. It's defaulted to -Xmx200m, which means 200MB of heap for each of the map/reduce task.

Looks like EC2 small has 1.7 GB memory. Here is the memory with the default settings by the Hadoop processes on the TaskTracker node. Thanks to "Hadoop : The Definitive Guide"

Datanode 1,000 MB
Tasktracker 1,000 MB
Tasktracker child map task 400 MB (2 * 200 MB)
Tasktracker child map task 400 MB (2 * 200 MB)

Total's to 2,800MB.

On top of this, there is the OS memory. Either pickup a nicer configuration or change the default settings. FYI, here is the recommendation on the H/W configuration for the different nodes.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜