开发者

Why aren't my nested lookarounds working correctly in my Perl substitution?

I have a Perl substitution which converts hyperlinks to lowercase:

's/(?<=<a href=")([^"]+)(?=")/\L$1/g'

I want the substitution to ignore any links which begin with a hash, for example I want it to change the path in <a href="FooBar/Foo.bar">Foo Bar</a> to lowercase but skip if it comes across <a href="#Bar">Bar</a>.

Nesting lookaheads to instruct it to skip these links isn't working correctly for me. This is the one-liner I've written:

perl -pi -e 's/(?<=<a href=" (?! (?<=<a href="#) ) )([^"]+)(?=")/\L$1/g' *;

Could anyone hint to me where I have gone wrong with this substitution? It executes just fine, but does not do anythi开发者_StackOverflow中文版ng.


As near as I can tell, your initial regex will work just fine, if you add the condition that the first character in the link may not be a hash # or a double quote, e.g. [^#"]

s/(?<=<a href=")([^#"][^"]+)(?=")/\L$1/gi;

In the case you have links which do not start with a hash, e.g. <a href="FooBar/Foo.bar#BarBar">Foo Bar</a>, it becomes slightly more complicated:

s{(?<=<a href=")([^#"]+)(#[^"]+)*(?=")}{ lc($1) . ($2 // "") }gei;

We now have to evaluate the substitution, since otherwise we get undefined variable warnings when the optional anchor reference is not present.


You don't need look-arounds, from what I see

use 5.010;
...

s/<a \s+ href \s* = \s* "\K([^#"][^"]*)"/\L$1"/gx;

\K means "keep" everything before it. It amounts to a variable-length look-behind.

perlre:

For various reasons \K may be significantly more efficient than the equivalent (?<=...) construct, and it is especially useful in situations where you want to efficiently remove something following something else in a string.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜