bash gnu parallel help

2023-03-15 05:44 问答作者：

its about http://en.wikipedia.org/wiki/Parallel_(software) and very rich manpage http://www.gnu.org/software/parallel/man.html

(for x in `cat list` ; do 
       do_something $x
   done) | process_output

is replaced by this

 cat list | parallel do_something | process_output

i am trying to implement that on this

    while [ "$n" -开发者_StackOverflow社区gt 0 ]
        do          
        percentage=${"scale=2;(100-(($n / $end) * 100))"|bc -l}}
    #get url from line specified by n from file done1              
nextUrls=`sed -n "${n}p" < done1`
    echo -ne "${percentage}%  $n / $end urls saved going to line 1. current: $nextUrls\r"
#    function that gets links from the url
    getlinks $nextUrls
#save n
    echo $n > currentLine
    let "n--"
    let "end=`cat done1 |wc -l`"
    done

while reading documentation for gnu parallel

i found out that functions are not supported so getlinks wont be used in parallel

best i have found so far is

seq 30 | parallel -n 4 --colsep '  ' echo {1} {2} {3} {4}

makes output

1 2 3 4 
5 6 7 8 
9 10 11 12 
13 14 15 16 
17 18 19 20 
21 22 23 24 
25 26 27 28 
29 30

while loop mentioned above should go like this if I am right

end=`cat done1 |wc -l`
seq $end -1 1 |  parallel -j+4 -k
#(all exept getlinks function goes here, but idk how? )|
# everytime it finishes do
 getlinks $nextUrls

thx for help in advance

It seems what you want is a progress meter. Try:

cat done1 | parallel --eta wget

If that is not what you want, look at sem (sem is an alias for parallel --semaphore and is normally installed with GNU Parallel):

for i in `ls *.log` ; do
  echo $i
  sem -j+0 gzip $i ";" echo done
done
sem --wait

In your case it will be something like:

while [ "$n" -gt 0 ]
    do          
    percentage=${"scale=2;(100-(($n / $end) * 100))"|bc -l}}
    #get url from line specified by n from file done1
    nextUrls=`sed -n "${n}p" < done1`
    echo -ne "${percentage}%  $n / $end urls saved going to line 1. current: $nextUrls\r"
    #    function that gets links from the url
    THE_URL=`getlinks $nextUrls`
    sem -j10 wget $THE_URL
    #save n
    echo $n > currentLine
    let "n--"
    let "end=`cat done1 |wc -l`"
done
sem --wait
echo All done

Why does getlinks need to be a function? Take the function and transform it into a shell script (should be essentially identical except you need to export environmental variables in and you of course cannot affect the outside environment without lots of work).

Of course, you cannot save $n into currentline when you are trying to execute in parallel. All files will be overwriting each other at the same time.

i was thinking of makeing something more like this, if not parallel or sam something else because parallel does not supprot funcitons aka http://www.gnu.org/software/parallel/man.html#aliases_and_functions_do_not_work

getlinks(){
if [ -n "$1" ]
then
    lynx -image_links -dump "$1" > src
    grep -i ".jpg" < src > links1
    grep -i "http"  < links1 >links  
    sed -e  's/.*\(http\)/http/g'  < links >> done1
    sort -f done1 > done2
    uniq done2 > done1
    rm -rf links1 links src done2 
fi
}
func(){
 percentage=${"scale=2;(100-(($1 / $end) * 100))"|bc -l}}
        #get url from line specified by n from file done1
        nextUrls=`sed -n "${$1}p" < done1`
        echo -ne "${percentage}%  $n / $end urls saved going to line 1. current: $nextUrls\r"
        #    function that gets links from the url
        getlinks $nextUrls
        #save n
        echo $1 > currentLine
        let "$1--"
        let "end=`cat done1 |wc -l`"
}
while [ "$n" -gt 0 ]
    do          
   sem -j10 func $n
done
sem --wait
echo All done

My script has become really complex, and i do not want to make a feature unavailable with something i am not sure it can be done this way i can get links with full internet traffic been used, should take less time that way

tryed sem

#!/bin/bash
func (){
echo 1
echo 2
}


for i in `seq 10`
do
sem -j10 func 
done
sem --wait
echo All done

you get

errors

Can't exec "func": No such file or directory at /usr/share/perl/5.10/IPC/Open3.p
m line 168.
open3: exec of func  failed at /usr/local/bin/sem line 3168

It is not quite clear what the end goal of your script is. If you are trying to write a parallel web crawler, you might be able to use the below as a template.

#!/bin/bash

# E.g. http://gatt.org.yeslab.org/
URL=$1
# Stay inside the start dir
BASEURL=$(echo $URL | perl -pe 's:#.*::; s:(//.*/)[^/]*:$1:')
URLLIST=$(mktemp urllist.XXXX)
URLLIST2=$(mktemp urllist.XXXX)
SEEN=$(mktemp seen.XXXX)

# Spider to get the URLs
echo $URL >$URLLIST
cp $URLLIST $SEEN

while [ -s $URLLIST ] ; do
  cat $URLLIST |
    parallel lynx -listonly -image_links -dump {} \; wget -qm -l1 -Q1 {} \; echo Spidered: {} \>\&2 |
    perl -ne 's/#.*//; s/\s+\d+.\s(\S+)$/$1/ and do { $seen{$1}++ or print }' |
    grep -F $BASEURL |
    grep -v -x -F -f $SEEN | tee -a $SEEN > $URLLIST2
  mv $URLLIST2 $URLLIST
done

rm -f $URLLIST $URLLIST2 $SEEN

继续阅读：bash gnu loops parallel-processing replace

bash gnu parallel help

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？