How to get names of the currently running hadoop jobs?
I need to get the list of job names that currently running, but hadoop -job list
give me a list of jobIDs.
- Is there a way to get names of the running jobs?
- Is there a way to get the job names from j开发者_StackOverflow中文版obIDs?
I've had to do this a number of times so I came up with the following command line that you can throw in a script somewhere and reuse. It prints the jobid followed by the job name.
hadoop job -list | egrep '^job' | awk '{print $1}' | xargs -n 1 -I {} sh -c "hadoop job -status {} | egrep '^tracking' | awk '{print \$3}'" | xargs -n 1 -I{} sh -c "echo -n {} | sed 's/.*jobid=//'; echo -n ' ';curl -s -XGET {} | grep 'Job Name' | sed 's/.* //' | sed 's/<br>//'"
If you use Hadoop YARN don't use mapred job -list
(or its deprecated version hadoop job -list
) just do
yarn application -appStates RUNNING -list
That also prints out the application/job name. For mapreduce applications you can get the corresponding JobId
by replacing the application
prefix of the Application-Id
with job
.
Modifying AnthonyF's script, you can use the following on Yarn:
mapred job -list 2> /dev/null | egrep '^\sjob' | awk '{print $1}' | xargs -n 1 -I {} sh -c "mapred job -status {} 2>/dev/null | egrep 'Job File' | awk '{print \$3}'" | xargs -n 1 -I{} sh -c "hadoop fs -cat {} 2>/dev/null | egrep 'mapreduce.job.name' | sed 's/.*<value>//' | sed 's/<\/value>.*//'"
If you do $HADOOP_HOME/bin/hadoop -job -status <jobid>
you will get a tracking URL in the output. Going to that URL will give you the tracking page, which has the name
Job Name: <job name here>
The -status
command also gives a file, which can also be seen from the tracking URL. In this file is a mapred.job.name
which has the job name.
I didn't find a way to access the job name from the command line. Not to say there isn't... but not found by me. :)
The tracking URL and xml file are probably your best options for getting the job name.
You can find the information in JobTracker
UI
You can see
Jobid
Priority
User
Name of the job
State of the job whether it succeed or failed
Start Time
Finish Time
Map % Complete
Reduce % Complete etc
INFO
Just In case any one interested in latest query to get the Job Name :-). Modified Pirooz Command -
mapred job -list 2> /dev/null | egrep '^job' | awk '{print $1}' | xargs -n 1 -I {} sh -c "mapred job -status {} 2>/dev/null | egrep 'Job File'" | awk '{print $3}' | xargs -n 1 -I{} sh -c "hadoop fs -cat {} 2>/dev/null" | egrep 'mapreduce.job.name' | awk -F"" '{print $2}' | awk -F "" '{print $1}'
I needed to look through history, so I changed mapred job -list
to mapred job -list all
....
I ended up adding a -L
to the curl command, so the block there was:
curl -s -L -XGET {}
This allows for redirection, such as if the job is retired and in the job history. I also found that it's JobName in the history HTML, so I changed the grep:
grep 'Job.*Name'
Plus of course changing hadoop
to mapred
. Here's the full command:
mapred job -list all | egrep '^job' | awk '{print $1}' | xargs -n 1 -I {} sh -c "mapred job -status {} | egrep '^tracking' | awk '{print \$3}'" | xargs -n 1 -I{} sh -c "echo -n {} | sed 's/.*jobid=//'; echo -n ' ';curl -s -L -XGET {} | grep 'Job.*Name' | sed 's/.* //' | sed 's/<br>//'"
(I also changed around the first grep so that I was only looking at a certain username....YMMV)
by typing "jps" in your terminal .
精彩评论