How can I run Hadoop run with a Java class?
I am following the book Hadoop: the definitive Guide.
I am confused on example 3-1.
There is a Java source file, URLCat.java.
I use javac
to compile it into URLCat.class, then use jar
to wrap it into a jar.
The book said to use
%开发者_高级运维 hadoop URLCat hdfs://localhost/user/tom/quangle.txt
to run it. I have tried a lot of different ways, such as
% hadoop jar URLCat.jar .......
but didn't work. I got errors like this:
Exception in thread "main" java.lang.ClassNotFoundException: hdfs://localhost/user/username/quangle/txt
What is the reason for this, and how do I do it right?
It's quite simple:
[me@myhost ~]$ hadoop jar
RunJar jarFile [mainClass] args...
So, what you want is hadoop jar yourJar.jar your.class.with.Main [any args]
Of course you could use cat, but that sort of isn't the point (i.e. you're learning, not just trying to get it to work).
As per the book, you need to set your HADOOP_CLASSPATH
environment variable. In my case, using the build example in the book, all of my classes are at: /media/data/hadefguide/book/build/classes
Here's an example:
hduser@MuleBox ~ $ export HADOOP_CLASSPATH=
hduser@MuleBox ~ $ hadoop URLCat hdfs://localhost/user/hduser/quangle.txt
Exception in thread "main" java.lang.NoClassDefFoundError: URLCat
Caused by: java.lang.ClassNotFoundException: URLCat
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: URLCat. Program will exit.
hduser@MuleBox ~ $ export HADOOP_CLASSPATH=/media/data/hadefguide/book/build/classes
hduser@MuleBox ~ $ hadoop URLCat hdfs://localhost/user/hduser/quangle.txt
On the top of the Crumpetty Tree
The Quangle Wangle sat,
But his face you could not see,
On account of his Beaver Hat.
Not sure how Useful is the answer now. I faced the same issue today in fact working on an example from the same book (Hadoop definitive guide) I was able to execute an example program as follows:
Write your java code and save it as
.java
fileCompile your java program using:
javac -classpath <path to hadoop core and commons-cli jar file> <path to your java program file>
Create a jar file containing your class file:
jar cvf <jar file> <class files to add separated by space>
Execute the jar file using
hadoop
command line:hadoop jar <jar file name> <class name containing your main method> <argument to the main method>
e.g.
hadoop jar FileSystemCat.jar FileSystemCat hdfs://localhost/user/root/MyFiles/meet_a_seer.txt
Hope it helps
The syntax of the command is a little bit different:
hadoop fs -cat hdfs:///user/tom/quangle.txt
Do you have hadoop home in your path? can you call hadoop without any parameters?
To make the hadoop URLCat command work you need to get the jar (URLCat.jar) to be in your class path. You can put it in lib/ dir of hadoop for that.
For the hadoop jar URLCat.jar to run you need to create a jar that will have Main class defined in it, otherwise it thinks that the next argument on the command line is the class name. What you can try is hadoop jar URLCat.jar URLCat hdfs://...
I did this based on help found on this site and the hadoop tutorial.
mkdir urlcat_classes<br>
javac -classpath /usr/lib/hadoop/hadoop-0.20.2-cdh3u1-core.jar -d urlcat_classes URLCat.java<br>
jar -cvf urlcat.jar -C urlcat_classes .<br>
hadoop jar urlcat.jar no.gnome.URLCat
hdfs://localhost/user/claus/sample.txt<br>
<br>
no.gnome is from 'package no.gnome;' in URLCat.java.<br><br>
regards
Claus
step 1: Compile Java Program:
javac URLCat.java -classpath $HADOOP_HOME/share/hadoop/common/hadoop-common-2.7.0.jar
step 2: Create jar file :
jar cvf URLCat.jar URLCat.class
Step 3: Execute program : (mention your hdfs file location)
hadoop jar URLCat.jar URLCat hdfs://localhost:9000/pcode/wcinput.txt
Go to the directory where your compiled .class
files are residing.
Use full class name including package name (refer to Receiving "wrong name" NoClassDefFoundError when executing a Java program from the command-line for full class name or which directory to run the job in) when running hadoop URLCat hdfs://localhost/user/tom/quangle.txt
.
In my case URLCat.java
was in com.tom.app
, so the hadoop command was hadoop com.tom.app.URLCat hdfs://localhost/user/tom/quangle.txt
.
We can access HDFS through the hdfs api. My understanding of it is that you can use the hdfs api to contact a hadoop cluster running the dfs and fetch data from it.
Why do we need to invoke the command as hadoop jar URLCat.jar
why not just java URLCat
Why does the client necessarily need to install hadoop and then contact the hadoop cluster?
精彩评论