Perl parse links from HTML Table
I'm trying to get links from table in HTML. By using HTML::TableExtract, I'm able to parse table and get text (i.e. Ability, Abnormal in below example) but cannot get link that involves in the table. For example,
<table id="AlphabetTable">
<tr>
<td>
<a href="/cate/A/Ability">Ability</a> <span class="count">2650</span>
</td>
<td>
<a href="/cate/A/Abnormal">Abnormal</a> <span class="count">26</span>
</td>
</table>
Is there a way to get link using HTML::TableExtract ? or other module that could possibly use in this situation. Thanks
part of my code:
$mech->get($link->url());
$te->parse($mech->content);
fore开发者_运维百科ach $ts ($te->tables){
foreach $row ($ts->rows){
print @$row[0]; #it only prints text part
#but I want its link
}
}
HTML::LinkExtor, passing the extracted table text to its parse method.
my $le = HTML::LinkExtor->new();
foreach $ts ($te->tables){
foreach $row ($ts->rows){
$le->parse($row->[0]);
for my $link_tag ( $le->links ) {
my ($tag, %links) = @$link_tag;
# next if $tag ne 'a'; # exclude other kinds of links?
print for values %links;
}
}
}
Use keep_html
option in the constructor.
keep_html
Return the raw HTML contained in the cell, rather than just the visible text. Embedded tables are not retained in the HTML extracted from a cell. Patterns for header matches must take into account HTML in the string if this option is enabled. This option has no effect if extracting into an element tree structure.
$te = HTML::TableExtract->new( keep_html => 1, headers => [qw(field1 ... fieldN)]);
精彩评论