开发者

Generating Multiple Output files with Hadoop 0.20+

I am trying to output the results of my reducer to multiple files. The data results are all contained in one file, and the rest of the results are split based on a category in their respected files. I know with 0.18 that you can do this with MultipleOutputs and it has not been removed. However, I am trying to make my application 0.20+ compliant. The existing Multiple outputs functionality st开发者_Go百科ill requires JobConf (which my application uses Job, and Configuration). How can I generate multiple outputs based on the key?


Support for MultipleOutputs isn't in 0.20. You will need to use the older API.

It has been added into 0.21 which is currently unreleased as org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.

This thread on the mailing list talks about this problem.


You can do this in Hadoop 0.20, just that as mentioned you have to use the older API.

There's some very rough code to do so in http://github.com/orngejaket/Info_Moist_1_Splicer/tree/master/src/contrib/streaming/src/java/org/infochimps/hadoop/mapred/lib/

The resulting jar writes each record to a file named after its (sanitized) key.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜