Problem with -libjars in hadoop

2023-03-25 09:00 问答作者：

I am trying to run MapReduce job on Hadoop but I am facing an error and I am not sure what is going wrong. I have to pas library jars which is required by my mapper.

I am excuting the following on the terminal:

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar /home/hadoop/vardtst.jar 开发者_开发技巧-libjars /home/hadoop/clui.jar -libjars /home/hadoop/model.jar gutenberg ou101

and I am getting the following Exception:

at java.net.URLClassLoader$1.run(URLClassLoader.java:202)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:190)

at java.lang.ClassLoader.loadClass(ClassLoader.java:306)

at java.lang.ClassLoader.loadClass(ClassLoader.java:247)

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:247)

at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

Please Help ..Thanks

Also worth to note subtle but important point: the way to specify additional JARs for JVMs running distributed map reduce tasks and for JVM running job client is very different.

-libjars makes Jars only available for JVMs running remote map and reduce task
To make these same JAR’s available to the client JVM (The JVM that’s created when you run the hadoop jar command) need to set HADOOP_CLASSPATH environment variable:

$ export LIBJARS=/path/jar1,/path/jar2
$ export HADOOP_CLASSPATH=/path/jar1:/path/jar2
$ hadoop jar my-example.jar com.example.MyTool -libjars ${LIBJARS} -mytoolopt value

See: http://grepalex.com/2013/02/25/hadoop-libjars/

Another cause of incorrect -libjars behaviour could be in wrong implementation and initialization of custom Job class.

Job class must implement Tool interface
Configuration class instance must be obtained by calling getConf() instead of creating new instance;

See: http://kickstarthadoop.blogspot.ca/2012/05/libjars-not-working-in-custom-mapreduce.html

When you are specifying the -LIBJARS with the Hadoop jar command. First make sure that you edit your driver class as shown below:

    public class myDriverClass extends Configured implements Tool {

      public static void main(String[] args) throws Exception {
         int res = ToolRunner.run(new Configuration(), new myDriverClass(), args);
         System.exit(res);
      }

      public int run(String[] args) throws Exception
      {

        // Configuration processed by ToolRunner 
        Configuration conf = getConf();
        Job job = new Job(conf, "My Job");

        ...
        ...

        return job.waitForCompletion(true) ? 0 : 1;
    }
}

Now edit your "hadoop jar" command as shown below:

hadoop jar YourApplication.jar [myDriverClass] args -libjars path/to/jar/file

Now lets understand what happens underneath. Basically we are handling the new command line arguments by implementing the TOOL Interface. ToolRunner is used to run classes implementing Tool interface. It works in conjunction with GenericOptionsParser to parse the generic hadoop command line arguments and modifies the Configuration of the Tool.

Within our Main() we are calling ToolRunner.run(new Configuration(), new myDriverClass(), args) - this runs the given Tool by Tool.run(String[]), after parsing with the given generic arguments. It uses the given Configuration, or builds one if it's null and then sets the Tool's configuration with the possibly modified version of the conf.

Now within the run method, when we call getConf() we get the modified version of the Configuration. So make sure that you have the below line in your code. If you implement everything else and still make use of Configuration conf = new Configuration(), nothing would work.

Configuration conf = getConf();

I found the answer, it was throwing error cause I was missing on the "main" class name in the command.

The correct way to execute is: hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar /home/hadoop/vardtst.jar VardTest -libjars /home/hadoop/clui.jar,/home/hadoop/model.jar gutenberg ou101

where VardTest is the class containing the main() method.

Thanks

继续阅读：mapreduce

Problem with -libjars in hadoop

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？