开发者

Map reduce value list order problem

As we know Hadoop groups values with per key and sends them to same reduce task. Suppose I have next lines in file on hdfs. line1 line2 line3 .... linen In map task I print filename and line. In reduce I receive in different orders.for examle key=> { line3, line1, l开发者_StackOverflowine2,....} Now, I have the next problem. I want to get this value list in order that they lie in file, as key =>{ line1, line2,...linen} Is there any way of doing this ?


If you are using TextInputFormat, you get a <LongWritable, Text> as mapper input. The LongWritable part (or the key) is the position of the line in the file (Not line number, but position from start of file I think). You can use that part to keep track of which line was first. For example, the mapper can output <Filename, TextPair(Position, Line)> as output instead of <Filename, Line> as you are doing now. Then you can sort the keys that the reducer gets based on the first part of the Pair (the Position) and you should get back the lines in the same order.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜