开发者

Specify Hadoop mapreduce input keys directly (not from a file)

I'd like to generate some data using a mapreduce. I'd like to invoke the job with one parameter N, and get Map called with each integer from 1 to N, once.

Obviously I want a Mapper<IntWritable, NullWritable, <my output types>>...that's easy. But I can't figure开发者_如何学Python out how to generate the input data! Is there an InputFormat I'm not seeing somewhere that lets me just pull keys + values from a collection directly?


Do you want each mapper to process all integers from 1 to N? Or do you want to distribute the processing of integers 1 to N across the concurrently running mappers?

If the former, I believe you'll need to create a custom InputFormat. If the latter, the easiest way might be to generate a text file with integers 1 to N, each integer on one line, and use LineInputFormat.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜