Why aren't my nested lookarounds working correctly in my Perl substitution?
I have a Perl substitution which converts hyperlinks to lowercase:
's/(?<=<a href=")([^"]+)(?=")/\L$1/g'
I want the substitution to ignore any links which begin with a hash, for example I want it to change the path in <a href="FooBar/Foo.bar">Foo Bar</a>
to lowercase but skip if it comes across <a href="#Bar">Bar</a>
.
Nesting lookaheads to instruct it to skip these links isn't working correctly for me. This is the one-liner I've written:
perl -pi -e 's/(?<=<a href=" (?! (?<=<a href="#) ) )([^"]+)(?=")/\L$1/g' *;
Could anyone hint to me where I have gone wrong with this substitution? It executes just fine, but does not do anythi开发者_StackOverflow中文版ng.
As near as I can tell, your initial regex will work just fine, if you add the condition that the first character in the link may not be a hash #
or a double quote, e.g. [^#"]
s/(?<=<a href=")([^#"][^"]+)(?=")/\L$1/gi;
In the case you have links which do not start with a hash, e.g. <a href="FooBar/Foo.bar#BarBar">Foo Bar</a>
, it becomes slightly more complicated:
s{(?<=<a href=")([^#"]+)(#[^"]+)*(?=")}{ lc($1) . ($2 // "") }gei;
We now have to evaluate the substitution, since otherwise we get undefined variable warnings when the optional anchor reference is not present.
You don't need look-arounds, from what I see
use 5.010;
...
s/<a \s+ href \s* = \s* "\K([^#"][^"]*)"/\L$1"/gx;
\K
means "keep" everything before it. It amounts to a variable-length look-behind.
perlre
:
For various reasons \K may be significantly more efficient than the equivalent
(?<=...)
construct, and it is especially useful in situations where you want to efficiently remove something following something else in a string.
精彩评论