开发者

How do I fetch and parse HTML with Perl?

How do I do the following in Perl in order: a) curl a page and save it to a variable b) parse the value of the variable (which is HTML content) for values开发者_开发百科 I want (ex: the info is kept between tags like ... )


My perl kung-fu is rusty, but I believe it's something along following lines.

To fetch something using curl and then extract for example contents of some html element:

use WWW::Curl::Easy;
my $curl = new WWW::Curl::Easy;
$curl->setopt(CURLOPT_URL, 'http://www.example.com/some-url.html');
open (my $fileb, ">", \$response_body);
$curl->setopt(CURLOPT_WRITEDATA, $fileb);
$curl->perform;
my $info = $curl->getinfo(CURLINFO_HTTP_CODE);

$response_body =~ m|<a[^>]+>(.+?)</a>|;

Now, $1 should contain contents of A element. If it doesn't, it will say that $1 is undefined or something similar. You should first check in $info that status code is as expected, of course. This being Perl code, it's ugly this way, but it works. However, I recommend not doing this often (and especially not in bigger scripts), as it's certainly fastest road to shooting yourself in the foot with Perl:

You shoot yourself in the foot, but nobody can understand how you did it. Six months later, neither can you.

I hope it helps.

P.S. I am sure that there is some easier way around, without this much code, but I can't remember how it goes...

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜