How to deal with unbalanced input of reduce task?
Recently I was asked how to deal with unbalanced input of reduce task. I thought for while and try to redistribute the data, but didn't come up with a goo开发者_运维知识库d solution. Any advice?
Actually you have 2 ways.
- Increase the number of reduces, so your data could possibly better spread along the tasks
- Rewrite the partitioner to better distribute the keys over the tasks. [1]
[1] http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapreduce/Partitioner.html
精彩评论