Problem with Hadoop Streaming -file option for Java class files
I am struggling with a very basic issue in hadoop streaming in the "-file" option.
First I tried the very basic example in streaming:
hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-streaming-0.20.203.0.jar -mapper org.apache.hadoop.mapred.lib.IdentityMapper \ -reducer /bin/wc -inputformat KeyValueTextInputFormat -input gutenberg/* -output gutenberg-outputtstchk22
which worked absolutely fine.
Then I copied the IdentityMapper.java source code and compiled it. Then I placed this class file in the /home/hadoop folder and executed the following in the terminal.
hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-streaming-0.20.203.0.jar -file ~/IdentityMapper.class -mapper IdentityMapper.class \ -reducer /bin/wc -inputformat KeyValueTextInputFormat -input gutenberg/* -output gutenberg-outputtstch6
The execution failed with the following error in the stderr file:
java.io.IOException: Cannot run program "IdentityMapper.class": java.io.IOException: error=2, No such file or directory
Then again I tried it by copying the IdentityMapper.class file in the hadoop installation and executed the following:
hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-streaming-0.20.203.0.jar -file IdentityMapper.class -mapper IdentityMapper.class \ -reducer /bin/wc -inputformat KeyValueTextInputFormat -input gutenberg/* -output gutenberg-outputtstch5
But unfortunately again I got the same error.
It woul开发者_如何转开发d be great if you can help me with it as I cannot move any further without overcoming this.
Thanking you in anticipation.
Why do you want to compile the class? It is already compiled in the hadoop jars. You are just passing the classname (org.apache.hadoop.mapred.lib.IdentityMapper), because Hadoop uses reflection to instantiate a new instance of this mapping class.
You have to make sure that this is lying in the classpath e.g. within a jar you are passing the job.
Same answer as for your other question, you can't really use -file to send over jars as hadoop doesn't support multiple jars (that were not already in the CLASSPATH), check the streaming docs:
At least as late as version 0.14, Hadoop does not support multiple jar files. So, when specifying your own custom classes you will have to pack them along with the streaming jar and use the custom jar instead of the default hadoop streaming jar.
I met similar problem. And adding the jar file to HADOOP_CLASSPATH fixed the issue. More info please refer this: http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
精彩评论