How can I make WWW:Mechanize to not fetch pages twice?

2022-12-24 20:52 问答作者：

I have a web scraping application, written in OO Perl. There's single WWW::Mechanize object used in the app. How can I make it to not fetch the same URL twice, i.e. make the second get() with the same URL no-op:

my $mech = WWW::Mechanize->new();
my $url = 'http:://google.com';

$mech->ge开发者_运维技巧t( $url ); # first time, fetch
$mech->get( $url ); # same url, do nothing

See WWW::Mechanize::Cached:

Synopsis

use WWW::Mechanize::Cached;

my $cacher = WWW::Mechanize::Cached->new;
$cacher->get( $url );

Description

Uses the Cache::Cache hierarchy to implement a caching Mech. This lets one perform repeated requests without hammering a server impolitely.

You can store the URLs and their content in a hash.

my $mech = WWW::Mechanize->new();
my $url = 'http://google.com';
my %response;

$response{$url} = $mech->get($url) unless $response{$url};

You can subclass WWW::Mechanize and redefine the get() method to do what you want:

package MyMech;
use base 'WWW::Mechanize';

sub get {
    my $self = shift;
    my($url) = @_;

    if (defined $self->res && $self->res->request->uri ne $url) {
        return $self->SUPER::get(@_)
    }
    return $self->res;
}

继续阅读：perl www-mechanize

How can I make WWW:Mechanize to not fetch pages twice?

Synopsis

Description

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Synopsis

Description

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？