Pipeline For Downloading and Processing Files In Unix/Linux Environment With Perl
I have a list of files URLS where I want to download them:
http://somedomain.com/foo1.gz
http://somedomain.com/foo2.gz
http://somedomain.com/foo3.gz
What I want to do is the following for each file:
- Download foo1,2.. in parallel with
wget
andnohup
. - Every time it completes download process them 开发者_JS百科with
myscript.sh
What I have is this:
#! /usr/bin/perl
@files = glob("foo*.gz");
foreach $file (@files) {
my $downurls = "http://somedomain.com/".$file;
system("nohup wget $file &");
system("./myscript.sh $file >> output.txt");
}
The problem is that I can't tell the above pipeline when does the file finish downloading. So now it myscript.sh doesn't get executed properly.
What's the right way to achieve this?
Why to do this using perl. use bash instead. Below is just a sample.
#!/bin/bash
for file in foo1 foo2 foo3
do
wget http://samedomain.com/$file.gz .
if [ -f $file.gz ];
then
./myscript.sh $file.gz >> output.txt
fi
done
Try combining the commands using &&
, so that the 2nd one runs only after the 1st one completes successfully.
system("(nohup wget $file && ./myscript.sh $file >> output.txt) &");
If you want parallel processing, you can do it yourself with forking, or use a built in module to handle it for you. Try Parallel::ForkManager. You can see a bit more on it's usage in How can I manage a fork pool in Perl?, but the CPAN page for the module will have the real useful info. You probably want something like this:
use Parallel::ForkManager;
my $MAX_PROCESSES = 8; # 8 parallel processes max
my $pm = new Parallel::ForkManager($MAX_PROCESSES);
my @files = glob("foo*.gz");
foreach $file (@all_data) {
# Forks and returns the pid for the child:
my $pid = $pm->start and next;
my $downurls = "http://somedomain.com/".$file;
system("wget $file");
system("./myscript.sh $file >> output.txt");
$pm->finish; # Terminates the child process
}
print "All done!\n";
精彩评论