开发者

crawl a website for data at frequent intervals

I need to crawl a website and retrieve certain data that keeps getti开发者_C百科ng updated every few minutes. How do i do this?


Load WWW::Mechanize for crawling, use the mirror method inherited from LWP::UserAgent.


Use sleep to control wait period, and use WWW::Mechanize for data retrieval:

use strict;
use WWW::Mechanize;

my $mech = WWW::Mechanize->new();
my $url = "http://www.nytimes.com";  # a sample webpage
while (1) {
    $mech->get($url);
    print $mech->content(format => 'text');  # read docs for WWW::Mechanize for advanced content processing
    sleep 300;  # wait for 5 minutes
}

EDIT: improved the sample content retrieval process.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜