Continue getting a partially-downloaded file
Is there a Perl-tool, which behaves like wget --continue
and is able to continue getting a partially-downl开发者_StackOverflowoaded file?
AnyEvent::HTTP
's documentation contains code that uses HTTP 1.1's ability to resume a download. I never used it though, so I can't comment on its suitability.
Apparently the example expects you to already know how to use AnyEvent
, which, of course, I did not know. You need to modify the code to have the event loop it expected to already be in place:
#!/usr/bin/perl
use strict;
use warnings;
use AnyEvent::HTTP;
my $url = "http://localhost/foo.txt";
my $file = "foo.txt";
sub download {
my ($url, $file, $cb) = @_;
open my $fh, "+>>:raw", $file
or die "could not open $file: $!";
my %hdr;
my $ofs = 0;
if (stat $fh and $ofs = -s _) {
$hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date((stat _)[9]);
$hdr{"range"} = "bytes=$ofs-";
}
http_get $url, (
headers => \%hdr,
on_header => sub {
my ($hdr) = @_;
if ($hdr->{Status} == 200 && $ofs) {
# resume failed
truncate $fh, $ofs = 0;
}
sysseek $fh, $ofs, 0;
return 1;
},
on_body => sub {
my ($data, $hdr) = @_;
if ($hdr->{Status} =~ /^2/) {
length $data == syswrite $fh, $data
or return; # abort on write errors
}
return 1;
},
sub {
my (undef, $hdr) = @_;
my $status = $hdr->{Status};
if (my $time = AnyEvent::HTTP::parse_date $hdr->{"last-modified"}) {
utime $fh, $time, $time;
}
if ($status == 200 || $status == 206 || $status == 416) {
# download ok || resume ok || file already fully downloaded
$cb->(1, $hdr);
} elsif ($status == 412) {
# file has changed while resuming, delete and retry
unlink $file;
$cb->(0, $hdr);
} elsif ($status == 500 or $status == 503 or $status =~ /^59/) {
# retry later
$cb->(0, $hdr);
} else {
$cb->(undef, $hdr);
}
}
);
}
my $quit = AnyEvent->condvar; #create a handle to exit the loop
download $url, $file, sub {
if ($_[0]) {
print "OK!\n";
} elsif (defined $_[0]) {
print "please retry later\n";
} else {
print "ERROR\n";
}
$quit->send; #quit the loop
};
$quit->recv; #start the loop
The key to making it work is the $quit
conditional variable:
my $quit = AnyEvent->condvar; #handle to exit the loop
.
.
.
$quit->recv;
This sets up an event loop. Without the event loop the program exits before the call to http_get
gets a chance to do anything but create the file. To exit the event loop we call $quit->send
in the callback to the download
function.
Tried it (2 times) and it worked.
#!/usr/local/bin/perl
use warnings;
use 5.014;
use utf8;
use LWP::UserAgent;
use File::Basename;
my $url = 'http://server/somelargefile';
my $file = basename $url;
my $ua = LWP::UserAgent->new( show_progress => 1 );
open my $fh, '>>:raw', $file or die $!;
my $bytes = -s $file;
my $res;
if ( $bytes ) {
say "resume download: $file ($bytes)";
$res = $ua->get(
$url,
'Range' => "bytes=$bytes-",
':content_cb' => sub { my ( $chunk ) = @_; print $fh $chunk; }
);
} else {
say "start download";
$res = $ua->get(
$url,
':content_cb' => sub { my ( $chunk ) = @_; print $fh $chunk; }
);
}
close $fh;
my $status = $res->status_line;
if ( $status =~ /^(200|206|416)/ ) {
say "OK" if $status =~ /^20[06]/;
say "$file already downloaded" if $status =~ /^416/;
} else {
say $status;
}
I found this with some help from Google (needed specific search terms, not saying GIYF). Link to Google Translate.
The example code they have there is
use strict;
use LWP::UserAgent;
my $u = "http://www.mangafox.com/media/manga.banner.png";
my $f = "tmp.jpg";
my $ua = LWP::UserAgent->new();
## Uncomment for test
# unlink $f;
# system("cp tmp.jpg tmp0.jpg");
# system( "head -c 10000 tmp0.jpg > tmp.jpg" );
download($u, $f);
sub download {
my ($url, $file) = @_;
my ($tries, @parameters, $FD);
@parameters = ( $url,
":content_cb" => sub { my ($chunk) = @_;
print $FD $chunk;
}
);
$tries = 4;
while ( $tries ) {
open($FD, ">>$file") || die "ERROR: $!";
my $bytes = -s $file;
if ( $bytes > 0 ) { push(@parameters, "Range" => "bytes=$bytes-" ) }
my $res =$ua->get( @parameters );
print $res->status_line . "\n";
close $FD;
# 416 Requested Range Not Satisfiable
# (file already fully downloaded)
if ( $res->is_success || $res->code == 416 ) { return }
$tries --;
}
die "ERROR: download $url";
}
# Test with ImageMagick
system("identify tmp.jpg");
I am not saying this works or is right for you. Just something I found. Use at your own risk.
精彩评论