Strange result on perl regexp - end string anchor & ungreedy at once
I have a very simple substitution:
my $s = "<a>test</a> <a>test</a>";
$s =~ s{ <a> .+? </a> $ }{WHAT}x;
print "$s\n";
that prints:
WHAT
But I was expecting:
<a>test</a> WHAT
What do I misunderstand about "end string anchor" in interaction with ungreedy option?
So, I was wrong abou开发者_如何学Got regexp engine. Indeed, dont humanize code - it doing rightly what you wrote, not you "think do".
Its just find first <a>
, then find </a>$
. First lockup are positive, pattern matched.
Right pattern must be something about:
$s =~ s{ <a> (?! .* <a> ) .* </a> }{WHAT}x;
thats give me correctly
<a>test</a> WHAT
because now I really asked regexp for last <a>
.
I think its less efficient [^<]+
, but more flexible.
This is one of the reasons you don't use a regex to match HTML. Try using a parser instead. See this question and its answers for more reasons not use a regex, and this question and its answers for examples of how to use an HTML parser.
The non-greedy modifier (and regexes in general) works from left-to-right, so in essence what is happening here is that it tries to find the shortest string that matches after the first <a>
until the next </a>
that is at the end of the string.
This does what you would expect:
my $s="<a>test</a> <a>test</a>";
$s =~ s#<a>[^<>]+</a>$#WHAT#;
print "$s\n";
What is the problem you're trying to solve?
精彩评论