开发者

Regex to replace ampersands, but not when they're in a URL

So I have this regex:

&(?!#?[xX]?(?:[0-9a-fA-F]+|\w+);)

That matches al开发者_开发百科l &'s in a block of text

However, if I have this string:

& & & & & <a href="http://localhost/MyFile.aspx?mything=2&this=4">My Text &</a>
---------------------------------------------------------^

... the marked & also get's targeted - and as I'm using it to replace the &'s with & the url then becomes invalid:

http://localhost/MyFile.aspx?mything=2&amp;this=4

D'oh! Does anyone know of a better way of encoding &'s that are not in a url.


No, the URL does not become invalid. The HTML code becomes:

<a href="http://localhost/MyFile.aspx?mything=2&amp;this=4">

This means that the code that was not correctly encoded now is correctly encoded, and the actual URL that the link contains is:

http://localhost/MyFile.aspx?mything=2&this=4

So, it's not a problem that the & character in the code gets encoded, on the contrary the code is now correct.


In powershell this could be done as:

$String ='& & & & & <a href="http://localhost/MyFile.aspx?mything=2&this=4">My Text &</a>'
$String -replace '(?<!<[^<>]*)&', "&amp;"

yields

&amp; &amp; &amp; &amp; &amp; <a href="http://localhost/MyFile.aspx?mything=2&this=4">My Text &amp;</a>

Dissecting the regex:

  1. The look around (?<! .... ) first validates that you're not in any tag
  2. All & strings are then found and replaced.
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜