开发者

How can I run Hadoop run with a Java class?

I am following the book Hadoop: the definitive Guide.

I am confused on example 3-1.

There is a Java source file, URLCat.java. I use javac to compile it into URLCat.class, then use jar to wrap it into a jar.

The book said to use

%开发者_高级运维 hadoop URLCat hdfs://localhost/user/tom/quangle.txt

to run it. I have tried a lot of different ways, such as

% hadoop jar URLCat.jar .......

but didn't work. I got errors like this:

Exception in thread "main" java.lang.ClassNotFoundException: hdfs://localhost/user/username/quangle/txt

What is the reason for this, and how do I do it right?


It's quite simple:

[me@myhost ~]$ hadoop jar
RunJar jarFile [mainClass] args...

So, what you want is hadoop jar yourJar.jar your.class.with.Main [any args]


Of course you could use cat, but that sort of isn't the point (i.e. you're learning, not just trying to get it to work).

As per the book, you need to set your HADOOP_CLASSPATH environment variable. In my case, using the build example in the book, all of my classes are at: /media/data/hadefguide/book/build/classes

Here's an example:

hduser@MuleBox ~ $ export HADOOP_CLASSPATH=
hduser@MuleBox ~ $ hadoop URLCat hdfs://localhost/user/hduser/quangle.txt
Exception in thread "main" java.lang.NoClassDefFoundError: URLCat
Caused by: java.lang.ClassNotFoundException: URLCat
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: URLCat.  Program will exit.
hduser@MuleBox ~ $ export HADOOP_CLASSPATH=/media/data/hadefguide/book/build/classes
hduser@MuleBox ~ $ hadoop URLCat hdfs://localhost/user/hduser/quangle.txt
On the top of the Crumpetty Tree
The Quangle Wangle sat,
But his face you could not see,
On account of his Beaver Hat.


Not sure how Useful is the answer now. I faced the same issue today in fact working on an example from the same book (Hadoop definitive guide) I was able to execute an example program as follows:

  • Write your java code and save it as .java file

  • Compile your java program using:

    javac -classpath <path to hadoop core and commons-cli jar file> <path to your java program file>
    
  • Create a jar file containing your class file:

    jar cvf <jar file> <class files to add separated by space>
    
  • Execute the jar file using hadoop command line:

    hadoop jar <jar file name> <class name containing your main method> <argument to the main method>
    

    e.g.

    hadoop jar FileSystemCat.jar FileSystemCat hdfs://localhost/user/root/MyFiles/meet_a_seer.txt
    

Hope it helps


The syntax of the command is a little bit different:

hadoop fs -cat hdfs:///user/tom/quangle.txt

Do you have hadoop home in your path? can you call hadoop without any parameters?


To make the hadoop URLCat command work you need to get the jar (URLCat.jar) to be in your class path. You can put it in lib/ dir of hadoop for that.

For the hadoop jar URLCat.jar to run you need to create a jar that will have Main class defined in it, otherwise it thinks that the next argument on the command line is the class name. What you can try is hadoop jar URLCat.jar URLCat hdfs://...


I did this based on help found on this site and the hadoop tutorial.

mkdir urlcat_classes<br>
javac -classpath /usr/lib/hadoop/hadoop-0.20.2-cdh3u1-core.jar -d     urlcat_classes URLCat.java<br>
jar -cvf urlcat.jar -C urlcat_classes .<br>
hadoop jar urlcat.jar no.gnome.URLCat       
hdfs://localhost/user/claus/sample.txt<br>
<br>
no.gnome is from 'package no.gnome;' in URLCat.java.<br><br>

regards
Claus


step 1: Compile Java Program:

javac URLCat.java -classpath $HADOOP_HOME/share/hadoop/common/hadoop-common-2.7.0.jar

step 2: Create jar file :

jar cvf URLCat.jar URLCat.class

Step 3: Execute program : (mention your hdfs file location)

hadoop jar URLCat.jar URLCat hdfs://localhost:9000/pcode/wcinput.txt


Go to the directory where your compiled .class files are residing.

Use full class name including package name (refer to Receiving "wrong name" NoClassDefFoundError when executing a Java program from the command-line for full class name or which directory to run the job in) when running hadoop URLCat hdfs://localhost/user/tom/quangle.txt.

In my case URLCat.java was in com.tom.app, so the hadoop command was hadoop com.tom.app.URLCat hdfs://localhost/user/tom/quangle.txt.


We can access HDFS through the hdfs api. My understanding of it is that you can use the hdfs api to contact a hadoop cluster running the dfs and fetch data from it.

Why do we need to invoke the command as hadoop jar URLCat.jar

why not just java URLCat

Why does the client necessarily need to install hadoop and then contact the hadoop cluster?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜