开发者

How can I replace text that is not part of an anchor tag in Perl?

What is a Perl regex that can replace select text that i开发者_如何学运维s not part of an anchor tag? For example I would like to replace only the last "text" in the following code.

blah <a href="http://www.text.com"> blah text blah </a> blah text blah.

Thanks.


You don't want to try to parse HTML with a regex. Try HTML::TreeBuilder instead.

use HTML::TreeBuilder;

my $html = HTML::TreeBuilder->new_from_file('file.html');
# or some other method, depending on where your HTML is

doReplace($html);

sub doReplace
{
  my $elt = shift;

  foreach my $node ($elt->content_refs_list) {
    if (ref $$node) {
      doReplace($$node) unless $$node->tag eq 'a';
    } else {
      $$node =~ s/text/replacement/g;
    } # end else this is a text node
  } # end foreach $node

} # end doReplace


I have temporarily prevailed:

$html =~ s|(text)([^<>]*?<)(?!\/a>)|replacement$2|is;

but I was dispirited, dismayed, and enervated by the seminal text; and so shall pursue Treebuilder in subsequent endeavors.


Don't use regexps for this kind of stuff. Use some proper HTML parser, and simply use plain regexp for parts of html that you're interested in.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜