How do I fetch and parse HTML with Perl?
How do I do the following in Perl in order: a) curl a page and save it to a variable b) parse the value of the variable (which is HTML content) for values开发者_开发百科 I want (ex: the info is kept between tags like ... )
My perl kung-fu is rusty, but I believe it's something along following lines.
To fetch something using curl and then extract for example contents of some html element:
use WWW::Curl::Easy;
my $curl = new WWW::Curl::Easy;
$curl->setopt(CURLOPT_URL, 'http://www.example.com/some-url.html');
open (my $fileb, ">", \$response_body);
$curl->setopt(CURLOPT_WRITEDATA, $fileb);
$curl->perform;
my $info = $curl->getinfo(CURLINFO_HTTP_CODE);
$response_body =~ m|<a[^>]+>(.+?)</a>|;
Now, $1 should contain contents of A element. If it doesn't, it will say that $1 is undefined or something similar. You should first check in $info that status code is as expected, of course. This being Perl code, it's ugly this way, but it works. However, I recommend not doing this often (and especially not in bigger scripts), as it's certainly fastest road to shooting yourself in the foot with Perl:
You shoot yourself in the foot, but nobody can understand how you did it. Six months later, neither can you.
I hope it helps.
P.S. I am sure that there is some easier way around, without this much code, but I can't remember how it goes...
精彩评论