Regex to replace ampersands, but not when they're in a URL
So I have this regex:
&(?!#?[xX]?(?:[0-9a-fA-F]+|\w+);)
That matches al开发者_开发百科l &'s in a block of text
However, if I have this string:
& & & & & <a href="http://localhost/MyFile.aspx?mything=2&this=4">My Text &</a>
---------------------------------------------------------^
... the marked & also get's targeted - and as I'm using it to replace the &'s with & the url then becomes invalid:
http://localhost/MyFile.aspx?mything=2&this=4
D'oh! Does anyone know of a better way of encoding &'s that are not in a url.
No, the URL does not become invalid. The HTML code becomes:
<a href="http://localhost/MyFile.aspx?mything=2&this=4">
This means that the code that was not correctly encoded now is correctly encoded, and the actual URL that the link contains is:
http://localhost/MyFile.aspx?mything=2&this=4
So, it's not a problem that the & character in the code gets encoded, on the contrary the code is now correct.
In powershell this could be done as:
$String ='& & & & & <a href="http://localhost/MyFile.aspx?mything=2&this=4">My Text &</a>'
$String -replace '(?<!<[^<>]*)&', "&"
yields
& & & & & <a href="http://localhost/MyFile.aspx?mything=2&this=4">My Text &</a>
Dissecting the regex:
- The look around (?<! .... ) first validates that you're not in any tag
- All & strings are then found and replaced.
精彩评论