Hadoop MapReduce with already sorted files

2023-03-15 11:06 问答作者：

I'm working with Hadoop MapReduce. I've got data in HDFS and data in each file is already sorted. Is it possible to force MapReduce not to resort the data after map phase? I've tried to change the map.sort.class to no-op, but it didn't work (i.e. the data wasn't sorted as I'd expected). Does 开发者_Python百科anyone tried doing something similar and managed to achieve it?

I think it depends on what style result you want, sorted result or unsorted result?

If you need result be sorted, I think hadoop is not suitable to do this work. There are two reasons:

INPUT DATA will be stored in different chunk(if big enough) and partitioned into multi-splits. Each one split will be mapped to one map task and all output of map tasks will gathered(after processes of partitioned/sorted/combined/copied/merged) as reduce's input. It is hard to keep keys in order among these stages.
Sort function exists not only after map process in map task. When do merge process during reduce task, there is sort option,too.

If you do not need result be sorted,I think this patch may be what you want:

Support no sort dataflow in map output and reduce merge phrase : https://issues.apache.org/jira/browse/MAPREDUCE-3397

继续阅读：mapreduce

Hadoop MapReduce with already sorted files

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？