开发者

Perl Treebuilder HTML Parsing, can't seem to parse to DIV, getting error "Use of uninitialized value in pattern match "

I'm new to using the Perl treebuilder module for HTML parsing and can't figure out what the issue is with this.. I have spent a few hours trying to get this to work and looked at a few tutorials but am still getting this error: "Use of uninitialized value in pattern match ", referring to this line in my code:

sub{ $_[0]-> tag() eq 'div' and ($_[0]->attr('class') =~ /snap_preview/)}
        );

This error prints out many times in the terminal, I have checked everything over and over and its definitely getting the input as the $downloaded page is a full HTML file that contains the string I give below... any advice is greatly appreciated.

sample string, contained within the $downloadedpage variable

        <div class='snap_preview'><p><a href="http://recipe4all.com/dishes/mexican/"><img src="http://www.dishbase.com/recipe_images/large/chicken-enchiladas-12005010871.jpg" width="160" height="115" align="left" border="0" alt="Mexican dishes recipes" style="border:none;"></a><a href="http://recipe4all.com/dishes/mexican/"><b>Mexican dishes recipes</b></a> <i></i><br />
Mexican cuisine is popular the world over for its intense flavor and colorful presentation. Traditional Mexican recipes such as tacos, quesadillas, enchiladas and barbacoa are consistently explored for options by some of the world&#8217;s foremost gourmet chefs. A celebration of spices and unique culinary trends, Mexican food is now dominating world cuisines.</p>
<div style="margin-top: 1em" class="possibly-related"><hr /><p><strong>Possibly related posts: (automatically generated)</strong></p><ul><li><a rel='related' href='http://vireja59.wordpress.com/2010/02/13/all-best-italian-dishes-recipes/' style='font-weight:bold'>All best Italian dishes recipes</a></li><li>&l开发者_如何学Pythont;a rel='related' href='http://vireja59.wordpress.com/2010/05/24/liver-dishes-recipes/' style='font-weight:bold'>Liver dishes recipes</a></li><li><a rel='related' href='http://vireja59.wordpress.com/2010/04/24/parsley-in-cooking/' style='font-weight:bold'>Parsley in cooking</a></li></ul></div>

my code:

    my $tree = HTML::TreeBuilder->new();
    $tree->parse($downloadedpage);
    $tree->eof();

    #the article is in the div with class "snap_preview"
    @article = $tree->look_down(
    sub{ $_[0]-> tag() eq 'div' and ($_[0]->attr('class') =~ /snap_preview/)}
    );


Using the exact code and example you gave,

use warnings;
use strict;
use HTML::TreeBuilder;
my $downloadedpage=<<EOF;
<div class='snap_preview'><p><a href="http://recipe4all.com/dishes/mexican/"><img src="http://www.dishbase.com/recipe_images/large/chicken-enchiladas-12005010871.jpg" width="160" height="115" align="left" border="0" alt="Mexican dishes recipes" style="border:none;"></a><a href="http://recipe4all.com/dishes/mexican/"><b>Mexican dishes recipes</b></a> <i></i><br />
Mexican cuisine is popular the world over for its intense flavor and colorful presentation. Traditional Mexican recipes such as tacos, quesadillas, enchiladas and barbacoa are consistently explored for options by some of the world&#8217;s foremost gourmet chefs. A celebration of spices and unique culinary trends, Mexican food is now dominating world cuisines.</p>
<div style="margin-top: 1em" class="possibly-related"><hr /><p><strong>Possibly related posts: (automatically generated)</strong></p><ul><li><a rel='related' href='http://vireja59.wordpress.com/2010/02/13/all-best-italian-dishes-recipes/' style='font-weight:bold'>All best Italian dishes recipes</a></li><li><a rel='related' href='http://vireja59.wordpress.com/2010/05/24/liver-dishes-recipes/' style='font-weight:bold'>Liver dishes recipes</a></li><li><a rel='related' href='http://vireja59.wordpress.com/2010/04/24/parsley-in-cooking/' style='font-weight:bold'>Parsley in cooking</a></li></ul></div>
EOF

my $tree = HTML::TreeBuilder->new();
    $tree->parse($downloadedpage);
    $tree->eof();

    #the article is in the div with class "snap_preview"
    my @article = $tree->look_down(
    sub{ $_[0]-> tag() eq 'div' and ($_[0]->attr('class') =~ /snap_preview/)}
    );

I don't get any errors at all. My first guess would be that there are some <div>s in the HTML which don't have a class attribute.

Maybe you need to write

sub{
     $_[0]-> tag() eq 'div' and 
     $_[0]->attr('class') and 
     ($_[0]->attr('class') =~ /snap_preview/)
}

there?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜