开发者

How to deal with unbalanced input of reduce task?

Recently I was asked how to deal with unbalanced input of reduce task. I thought for while and try to redistribute the data, but didn't come up with a goo开发者_运维知识库d solution. Any advice?


Actually you have 2 ways.

  1. Increase the number of reduces, so your data could possibly better spread along the tasks
  2. Rewrite the partitioner to better distribute the keys over the tasks. [1]

[1] http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapreduce/Partitioner.html

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜