Hadoop Map Reduce Program

2023-02-19 13:41 问答作者：

When I was trying the Map Reduce programming example from Hadoop in Action book based on Hadoop 0.20 API I got the error

java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.Text

But as far as i checked i am passing everything properly. It would be really helpful if someone can help me with this.

Here is the code. Its the same code which is in the book.

@SuppressWarnings("unused")
public class CountPatents extends Configured implements Tool {
    @SuppressWarnings("deprecation")

    public static class MapClass extends MapReduceBase implements Mapper<Text, Text, Text, Text> {
        public void map(Text key, Text value,OutputCollector<Text, Text> output,Reporter reporter) throws IOException {
            output.collect(value, key);
        }
    }
public static class Reduce extends MapReduceBase implements Reducer<Text, Text, Text, IntWritable> {
    public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
        int count=0;
        while(values.hasNext()){
            count=count+1;

            values.next();

        }


        output.collect(key, new IntWritable(count));
    }
}


    public int run(String[] args) throws Exception {

    Configuration conf = getConf();
    JobConf job = new JobConf(conf, CountPatents.class);
    Path in = new Path(args[0]);
    Path out = new Path(args[1]);
    FileInputFormat.setInputPaths(job, in);
    FileOutputFormat.setOutputPath(job, out);
    job.setJobName("MyJob");
    job.setMapperClass(MapClass.class);
    job.setReducerClass(Reduce.class);
    job.setInputFormat(KeyValueTextInputFormat.class);
    job.setOutputFormat(TextOutputFormat.class);
  开发者_如何学C  job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);
    job.set("key.value.separator.in.input.line", ",");
    JobClient.runJob(job);
    return 0;
    }
    public static void main(String[] args) throws Exception {
        int res = ToolRunner.run(new Configuration(), new CountPatents(), args);
        System.exit(res);


    }

    }

From a quick look (not running the code locally), it looks like you are setting the output of the job to be of type Text when you set job.setOutputValueClass(Text.class);, but the output type on your reducer is set to IntWritable. That's likely the error.

Missed a call:

job.setMapOutputValueClass(IntWritable.class);

Same problem using the new 0.20 interface, and the new "Job" object, in place of JobConf.

Error should be in output from reducer:

Your resuce class defination is as follows:

public static class Reduce extends MapReduceBase implements Reducer

so output value should be of IntWritable type.

However, you have mentioned job.setOutputValueClass(Text.class);

So as per configuration , output of reducer should be Text.

Solution: In the configuration , add following lines job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class);

and Modify: job.setOutputValueClass(IntWritable.class);

Then try to run

Map emits < Text,Text >

So set

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);

setMapOutputKeyClass setMapOutputValueClass

In your reducer function you are using OutputCollector which means Output key class would be of type Text and Output value class would be of type IntWritable. However in main (run) function, you have set job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class);.

Change job.setOutputValueClass(Text.class) to job.setOutputValueClass(IntWritable.class) and you are good to go !

Also it is always better to set MapperOutputKeyType and MapperOutputValueType to avoid any discrepancy. Hadoop uses the Writable interface based mechanism instead of the native Java Serialization mechanism. Unlike the Java Serialization mechanism, this method does not encapsulate the class name in the serialized entity. Hence the explicit class name is required to instantiate these classes from the Mapper to Reducer as is not possible to deserialize byte arrays representing Writable instances without knowing class being deserialized into (Reducer input key and value instance). This information needs to be explicitly provided by invoking setMapOutputKeyClass and setMapOutputValueClass on the Job instance

public static class MapClass extends MapReduceBase implements Mapper<Text, Text, Text, Text> { public void map(Text key, Text value,OutputCollector<Text, Text> output,Reporter reporter) throws IOException { output.collect(value, key); } } public static class Reduce extends MapReduceBase implements Reducer<Text, Text, Text, IntWritable> { public void reduce(Text key, Iterator values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int count=0; while(values.hasNext()){ count=count+1;

        values.next();

    }


    output.collect(key, new IntWritable(count));
}

}

public int run(String[] args) throws Exception {

Configuration conf = getConf();
JobConf job = new JobConf(conf, CountPatents.class);
Path in = new Path(args[0]);
Path out = new Path(args[1]);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
job.setJobName("MyJob");
job.setMapperClass(MapClass.class);
job.setReducerClass(Reduce.class);
job.setInputFormat(KeyValueTextInputFormat.class);
job.setOutputFormat(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.set("key.value.separator.in.input.line", ",");
JobClient.runJob(job);
return 0;
}
public static void main(String[] args) throws Exception {
    int res = ToolRunner.run(new Configuration(), new CountPatents(), args);
    System.exit(res);


}

}

继续阅读：reduce

Hadoop Map Reduce Program

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？