Hadoop Map Reduce Program
When I was trying the Map Reduce programming example from Hadoop in Action book based on Hadoop 0.20 API I got the error
java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.Text
But as far as i checked i am passing everything properly. It would be really helpful if someone can help me with this.
Here is the code. Its the same code which is in the book.
@SuppressWarnings("unused")
public class CountPatents extends Configured implements Tool {
@SuppressWarnings("deprecation")
public static class MapClass extends MapReduceBase implements Mapper<Text, Text, Text, Text> {
public void map(Text key, Text value,OutputCollector<Text, Text> output,Reporter reporter) throws IOException {
output.collect(value, key);
}
}
public static class Reduce extends MapReduceBase implements Reducer<Text, Text, Text, IntWritable> {
public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
int count=0;
while(values.hasNext()){
count=count+1;
values.next();
}
output.collect(key, new IntWritable(count));
}
}
public int run(String[] args) throws Exception {
Configuration conf = getConf();
JobConf job = new JobConf(conf, CountPatents.class);
Path in = new Path(args[0]);
Path out = new Path(args[1]);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
job.setJobName("MyJob");
job.setMapperClass(MapClass.class);
job.setReducerClass(Reduce.class);
job.setInputFormat(KeyValueTextInputFormat.class);
job.setOutputFormat(TextOutputFormat.class);
开发者_如何学C job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.set("key.value.separator.in.input.line", ",");
JobClient.runJob(job);
return 0;
}
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new CountPatents(), args);
System.exit(res);
}
}
From a quick look (not running the code locally), it looks like you are setting the output of the job to be of type Text when you set job.setOutputValueClass(Text.class);
, but the output type on your reducer is set to IntWritable. That's likely the error.
Missed a call:
job.setMapOutputValueClass(IntWritable.class);
Same problem using the new 0.20 interface, and the new "Job" object, in place of JobConf.
Error should be in output from reducer:
Your resuce class defination is as follows:
public static class Reduce extends MapReduceBase implements Reducer
so output value should be of IntWritable type.
However, you have mentioned job.setOutputValueClass(Text.class);
So as per configuration , output of reducer should be Text.
Solution: In the configuration , add following lines job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class);
and Modify: job.setOutputValueClass(IntWritable.class);
Then try to run
Map emits < Text,Text >
So set
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
setMapOutputKeyClass setMapOutputValueClass
In your reducer function you are using OutputCollector which means Output key class would be of type Text and Output value class would be of type IntWritable. However in main (run) function, you have set job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class);.
Change job.setOutputValueClass(Text.class) to job.setOutputValueClass(IntWritable.class) and you are good to go !
Also it is always better to set MapperOutputKeyType and MapperOutputValueType to avoid any discrepancy. Hadoop uses the Writable interface based mechanism instead of the native Java Serialization mechanism. Unlike the Java Serialization mechanism, this method does not encapsulate the class name in the serialized entity. Hence the explicit class name is required to instantiate these classes from the Mapper to Reducer as is not possible to deserialize byte arrays representing Writable instances without knowing class being deserialized into (Reducer input key and value instance). This information needs to be explicitly provided by invoking setMapOutputKeyClass and setMapOutputValueClass on the Job instance
public static class MapClass extends MapReduceBase implements Mapper<Text, Text, Text, Text> { public void map(Text key, Text value,OutputCollector<Text, Text> output,Reporter reporter) throws IOException { output.collect(value, key); } } public static class Reduce extends MapReduceBase implements Reducer<Text, Text, Text, IntWritable> { public void reduce(Text key, Iterator values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int count=0; while(values.hasNext()){ count=count+1;
values.next();
}
output.collect(key, new IntWritable(count));
}
}
public int run(String[] args) throws Exception {
Configuration conf = getConf();
JobConf job = new JobConf(conf, CountPatents.class);
Path in = new Path(args[0]);
Path out = new Path(args[1]);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
job.setJobName("MyJob");
job.setMapperClass(MapClass.class);
job.setReducerClass(Reduce.class);
job.setInputFormat(KeyValueTextInputFormat.class);
job.setOutputFormat(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.set("key.value.separator.in.input.line", ",");
JobClient.runJob(job);
return 0;
}
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new CountPatents(), args);
System.exit(res);
}
}
精彩评论