How to get names of the currently running hadoop jobs?

2023-03-02 18:38 问答作者：

I need to get the list of job names that currently running, but hadoop -job list give me a list of jobIDs.

Is there a way to get names of the running jobs?
Is there a way to get the job names from j开发者_StackOverflow中文版obIDs?

I've had to do this a number of times so I came up with the following command line that you can throw in a script somewhere and reuse. It prints the jobid followed by the job name.

hadoop job -list | egrep '^job' | awk '{print $1}' | xargs -n 1 -I {} sh -c "hadoop job -status {} | egrep '^tracking' | awk '{print \$3}'" | xargs -n 1 -I{} sh -c "echo -n {} | sed 's/.*jobid=//'; echo -n ' ';curl -s -XGET {} | grep 'Job Name' | sed 's/.* //' | sed 's/<br>//'"

If you use Hadoop YARN don't use mapred job -list (or its deprecated version hadoop job -list) just do

yarn application -appStates RUNNING -list

That also prints out the application/job name. For mapreduce applications you can get the corresponding JobId by replacing the application prefix of the Application-Id with job.

Modifying AnthonyF's script, you can use the following on Yarn:

mapred job -list 2> /dev/null | egrep '^\sjob' | awk '{print $1}' | xargs -n 1 -I {} sh -c "mapred job -status {} 2>/dev/null | egrep 'Job File' | awk '{print \$3}'" | xargs -n 1 -I{} sh -c "hadoop fs -cat {} 2>/dev/null | egrep 'mapreduce.job.name' | sed 's/.*<value>//' | sed 's/<\/value>.*//'"

If you do $HADOOP_HOME/bin/hadoop -job -status <jobid> you will get a tracking URL in the output. Going to that URL will give you the tracking page, which has the name

Job Name: <job name here>

The -status command also gives a file, which can also be seen from the tracking URL. In this file is a mapred.job.name which has the job name.

I didn't find a way to access the job name from the command line. Not to say there isn't... but not found by me. :)

The tracking URL and xml file are probably your best options for getting the job name.

You can find the information in JobTracker UI

You can see

Jobid
Priority    
User
Name of the job
State of the job whether it succeed or failed
Start Time  
Finish Time 
Map % Complete  
Reduce % Complete etc

INFO

Just In case any one interested in latest query to get the Job Name :-). Modified Pirooz Command -

mapred job -list 2> /dev/null | egrep '^job' | awk '{print $1}' | xargs -n 1 -I {} sh -c "mapred job -status {} 2>/dev/null | egrep 'Job File'" | awk '{print $3}' | xargs -n 1 -I{} sh -c "hadoop fs -cat {} 2>/dev/null" | egrep 'mapreduce.job.name' | awk -F"" '{print $2}' | awk -F "" '{print $1}'

I needed to look through history, so I changed mapred job -list to mapred job -list all....

I ended up adding a -L to the curl command, so the block there was:

curl -s -L -XGET {}

This allows for redirection, such as if the job is retired and in the job history. I also found that it's JobName in the history HTML, so I changed the grep:

grep 'Job.*Name'

Plus of course changing hadoop to mapred. Here's the full command:

mapred job -list all | egrep '^job' | awk '{print $1}' | xargs -n 1 -I {} sh -c "mapred job -status {} | egrep '^tracking' | awk '{print \$3}'" | xargs -n 1 -I{} sh -c "echo -n {} | sed 's/.*jobid=//'; echo -n ' ';curl -s -L -XGET {} | grep 'Job.*Name' | sed 's/.* //' | sed 's/<br>//'"

(I also changed around the first grep so that I was only looking at a certain username....YMMV)

by typing "jps" in your terminal .

How to get names of the currently running hadoop jobs?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？