hadoop - Where are input/output files stored in hadoop and how to execute java file in hadoop?
Suppose I write a java program and i want to run it in Hadoop, then
- where should 开发者_开发技巧the file be saved?
- how to access it from hadoop?
- should i be calling it by the following command?
hadoop classname
- what is the command in hadoop to execute the java file?
The simplest answers I can think of to your questions are:
1) Anywhere
2,3,4)$HADOOP_HOME/bin/hadoop jar [path_to_your_jar_file]
A similar question was asked here Executing helloworld.java in apache hadoop
It may seem complicated, but it's simpler than you might think!
- Compile your
map/reduce
classes, and yourmain
class into a jar. Let's call this jarmyjob.jar
.- This jar does not need to include the Hadoop libraries, but it should include any other dependencies you have.
- Your main method should set up and run your map/reduce job, here is an example.
- Put this jar on any machine with the
hadoop
command line utility installed. - Run your main method using the hadoop command line utility:
hadoop jar myjob.jar
Hope that helps.
- where should the file be saved?
The data should be saved in "hdfs". You will want to probably load it into the cluster from your data source using something like Apache Flume. The file can be placed anywhere but most home is /user/hadoop/
- how to access it from hadoop?
SSH into the hadoop cluster headnode like a standard linux server.
To list your hadoop root hdfs
hadoop fs -ls /
- should i be calling it by the following command?
hadoop classname
You should be using the hadoop command to access your data and run your programs, try hadoop help
- what is the command in hadoop to execute the java file?
hadoop -jar MyJar.jar com.mycompany.MainDriver arg[0] arg[1] ...
精彩评论