Parsing / Extracting the inside of an HTML Tag using Perl?
I've been searching a lot on this the past couple days but still haven't found a clear way to do this... I know its simpl开发者_开发技巧e to parse HTML with Perl to retrieve the text between tags, but I need to actually retrieve the text inside of a tag instead, such as this:
<input type="hidden" name="next_webapp_page" value=""/>
Here, I would want to extract the entire tag (or possibly the tag excluding the word "input"... I don't want to use Regex, I prefer to use a parser, any advice is appreciated.
Using HTML::TokeParser::Simple, look for input
tags and print using the as_is
method. Example:
#!/usr/bin/perl
use strict; use warnings;
use HTML::TokeParser::Simple;
my $parser = HTML::TokeParser::Simple->new(
string => '<input type="hidden" name="next_webapp_page" value=""/>'
);
while ( my $tag = $parser->get_tag('input') ) {
print $tag->as_is, "\n";
for my $attr ( qw( type name value ) ) {
printf qq{%s="%s"\n}, $attr, $tag->get_attr($attr);
}
}
Output:
<input type="hidden" name="next_webapp_page" value=""/>
type="hidden" name="next_webapp_page" value=""
精彩评论