开发者

Hadoop Map Reduce Program

When I was trying the Map Reduce programming example from Hadoop in Action book based on Hadoop 0.20 API I got the error

java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.Text

But as far as i checked i am passing everything properly. It would be really helpful if someone can help me with this.

Here is the code. Its the same code which is in the book.

@SuppressWarnings("unused")
public class CountPatents extends Configured implements Tool {
    @SuppressWarnings("deprecation")

    public static class MapClass extends MapReduceBase implements Mapper<Text, Text, Text, Text> {
        public void map(Text key, Text value,OutputCollector<Text, Text> output,Reporter reporter) throws IOException {
            output.collect(value, key);
        }
    }
public static class Reduce extends MapReduceBase implements Reducer<Text, Text, Text, IntWritable> {
    public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
        int count=0;
        while(values.hasNext()){
            count=count+1;

            values.next();

        }


        output.collect(key, new IntWritable(count));
    }
}


    public int run(String[] args) throws Exception {

    Configuration conf = getConf();
    JobConf job = new JobConf(conf, CountPatents.class);
    Path in = new Path(args[0]);
    Path out = new Path(args[1]);
    FileInputFormat.setInputPaths(job, in);
    FileOutputFormat.setOutputPath(job, out);
    job.setJobName("MyJob");
    job.setMapperClass(MapClass.class);
    job.setReducerClass(Reduce.class);
    job.setInputFormat(KeyValueTextInputFormat.class);
    job.setOutputFormat(TextOutputFormat.class);
  开发者_如何学C  job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);
    job.set("key.value.separator.in.input.line", ",");
    JobClient.runJob(job);
    return 0;
    }
    public static void main(String[] args) throws Exception {
        int res = ToolRunner.run(new Configuration(), new CountPatents(), args);
        System.exit(res);


    }

    }


From a quick look (not running the code locally), it looks like you are setting the output of the job to be of type Text when you set job.setOutputValueClass(Text.class);, but the output type on your reducer is set to IntWritable. That's likely the error.


Missed a call:

job.setMapOutputValueClass(IntWritable.class);

Same problem using the new 0.20 interface, and the new "Job" object, in place of JobConf.


Error should be in output from reducer:

Your resuce class defination is as follows:

public static class Reduce extends MapReduceBase implements Reducer

so output value should be of IntWritable type.

However, you have mentioned job.setOutputValueClass(Text.class);

So as per configuration , output of reducer should be Text.

Solution: In the configuration , add following lines job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class);

and Modify: job.setOutputValueClass(IntWritable.class);

Then try to run


Map emits < Text,Text >

So set

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);

setMapOutputKeyClass setMapOutputValueClass


In your reducer function you are using OutputCollector which means Output key class would be of type Text and Output value class would be of type IntWritable. However in main (run) function, you have set job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class);.

Change job.setOutputValueClass(Text.class) to job.setOutputValueClass(IntWritable.class) and you are good to go !

Also it is always better to set MapperOutputKeyType and MapperOutputValueType to avoid any discrepancy. Hadoop uses the Writable interface based mechanism instead of the native Java Serialization mechanism. Unlike the Java Serialization mechanism, this method does not encapsulate the class name in the serialized entity. Hence the explicit class name is required to instantiate these classes from the Mapper to Reducer as is not possible to deserialize byte arrays representing Writable instances without knowing class being deserialized into (Reducer input key and value instance). This information needs to be explicitly provided by invoking setMapOutputKeyClass and setMapOutputValueClass on the Job instance


public static class MapClass extends MapReduceBase implements Mapper<Text, Text, Text, Text> { public void map(Text key, Text value,OutputCollector<Text, Text> output,Reporter reporter) throws IOException { output.collect(value, key); } } public static class Reduce extends MapReduceBase implements Reducer<Text, Text, Text, IntWritable> { public void reduce(Text key, Iterator values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int count=0; while(values.hasNext()){ count=count+1;

        values.next();

    }


    output.collect(key, new IntWritable(count));
}

}

public int run(String[] args) throws Exception {

Configuration conf = getConf();
JobConf job = new JobConf(conf, CountPatents.class);
Path in = new Path(args[0]);
Path out = new Path(args[1]);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
job.setJobName("MyJob");
job.setMapperClass(MapClass.class);
job.setReducerClass(Reduce.class);
job.setInputFormat(KeyValueTextInputFormat.class);
job.setOutputFormat(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.set("key.value.separator.in.input.line", ",");
JobClient.runJob(job);
return 0;
}
public static void main(String[] args) throws Exception {
    int res = ToolRunner.run(new Configuration(), new CountPatents(), args);
    System.exit(res);


}

}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜