How can I replace text that is not part of an anchor tag in Perl?
What is a Perl regex that can replace select text that i开发者_如何学运维s not part of an anchor tag? For example I would like to replace only the last "text" in the following code.
blah <a href="http://www.text.com"> blah text blah </a> blah text blah.
Thanks.
You don't want to try to parse HTML with a regex. Try HTML::TreeBuilder instead.
use HTML::TreeBuilder;
my $html = HTML::TreeBuilder->new_from_file('file.html');
# or some other method, depending on where your HTML is
doReplace($html);
sub doReplace
{
my $elt = shift;
foreach my $node ($elt->content_refs_list) {
if (ref $$node) {
doReplace($$node) unless $$node->tag eq 'a';
} else {
$$node =~ s/text/replacement/g;
} # end else this is a text node
} # end foreach $node
} # end doReplace
I have temporarily prevailed:
$html =~ s|(text)([^<>]*?<)(?!\/a>)|replacement$2|is;
but I was dispirited, dismayed, and enervated by the seminal text; and so shall pursue Treebuilder in subsequent endeavors.
Don't use regexps for this kind of stuff. Use some proper HTML parser, and simply use plain regexp for parts of html that you're interested in.
精彩评论