Specify Hadoop mapreduce input keys directly (not from a file)
I'd like to generate some data using a mapreduce. I'd like to invoke the job with one parameter N, and get Map called with each integer from 1 to N, once.
Obviously I want a Mapper<IntWritable, NullWritable, <my output types>>
...that's easy. But I can't figure开发者_如何学Python out how to generate the input data! Is there an InputFormat
I'm not seeing somewhere that lets me just pull keys + values from a collection directly?
Do you want each mapper to process all integers from 1 to N? Or do you want to distribute the processing of integers 1 to N across the concurrently running mappers?
If the former, I believe you'll need to create a custom InputFormat. If the latter, the easiest way might be to generate a text file with integers 1 to N, each integer on one line, and use LineInputFormat.
精彩评论