crawl a website for data at frequent intervals
I need to crawl a website and retrieve certain data that keeps getti开发者_C百科ng updated every few minutes. How do i do this?
Load WWW::Mechanize
for crawling, use the mirror
method inherited from LWP::UserAgent
.
Use sleep
to control wait period, and use WWW::Mechanize
for data retrieval:
use strict;
use WWW::Mechanize;
my $mech = WWW::Mechanize->new();
my $url = "http://www.nytimes.com"; # a sample webpage
while (1) {
$mech->get($url);
print $mech->content(format => 'text'); # read docs for WWW::Mechanize for advanced content processing
sleep 300; # wait for 5 minutes
}
EDIT: improved the sample content retrieval process.
精彩评论