How can I get links that match a regex using WWW::Mechanize?
I'm trying to use regular expressions to catch a link, but can not. I have all the links, but there are many links that do not want.
What I do is to grab all links:
http://valeptr.com/scripts/runner.php?IM=
To comply with this pattern.
I put the script I'm doing:
use warnings;
use strict;
use WWW::Mechanize;
use WWW::Mechanize::Sleepy;
my $Explorador =
WWW::Mechanize->new(
agent =>
'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624',
sleep => '5..20'
);
#Proceed to access the URL to find all the links in emails
$Explorador->get("file:/home/alejandro/Escritorio/hehe.php.html");
#If you want debug DOM Document.
#print $Explorador->content();
my @links = $Explorador->links;
foreach my $link (@links) {
# Retrieve the link URL like:
# http://valeptr.com/scripts/runner.php?IM=0cdb7d48110375.
my $href = $link->url;
foreach my $s ($href) { #Aqui la expresión regular
my @links = $s =~ qr{
(
[^B]*
)
$
}x;
foreach (@links) {
print "\n",$_;
}
}
}
PS: I guess this regular expression will be more than seen but not seen. If so am coming back to put a post with the same.
Problem:
There is a heap of links and I need cojer the links that expire with the boss:
Http: // valeptr.com/scripts/runner.php?IM=
For it in the line 19 I have开发者_JAVA百科 to apply an expression regulate.
This variable my @links=$Explorador->links; he returns all the links that exist.
But I want cojer only the link that I have put above.
Sincerely,
Why not get WWW::Mechanize
to do the work for you, especially when it can filter out the links for you via a supplied regex?
my @wanted_links = $Explorador->find_all_links (
url_regex => qr{scripts/runner\.php\?IM=}
);
No for
loops!
As your reference link seems to be fix, you could take into account using substr instead of regex
$ref_link = q!http://valeptr.com/scripts/runner.php?IM=!;
foreach my $link ( $Explorador->links ) {
my $href = $link->url;
if ( substr($href, 0, length($ref_link)) eq $ref_link ) {
push @save, $href;
}
}
精彩评论