Regex gives compiler error
<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:"[^"]*"|'[^']*'|[\w\-.:]+))?)*\s*/?>\s*</\1\s*>
Above is the regex which I took from Remove all empty HTML tags? and I am trying to use the same below :-
string regex= @"<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:"[^"]*"|'[^']*'|[\w\-.:]+))?)*\s*/?>\s*</\1\s*>";
And I开发者_运维知识库 get many compile errors like - newline in constant, unrecognized escape sequence on the above line itself.
Could anybody help me by pointing what am I missing?
You need to use ""
for double quotes inside the string:
string regex= @"<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:""[^""]*""|'[^']*'|[\w\-.:]+))?)*\s*/?>\s*</\1\s*>";
You have double quotes inside the regexp that need to be quoted.
string regex= @"<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:"[^"]*"|'[^']*'|[\w\-.:]+))?)*\s*/?>\s*</\1\s*>";
should be
string regex= @"<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:\u0022[^\u0022]*\u0022|'[^']*'|[\w\-.:]+))?)*\s*/?>\s*</\1\s*>";
Btw, because of the <\/\1\s*>
this will only remove balanced tags surrounding space. It will match <p> </p>
but not <img src=bogus onerror=alert(1337)>
.
Even if all you want to do is remove balanced tags around space, be aware that this will not match all such tags. Specifically, it will not match tags where the tag name varies by case: <p> </P>
.
Finally, it will not remove transitively empty tags: <i><b></b></i>
-> <i></i>
.
Single double quotes( "
) have to be escaped with double double quotes ( ""
) in verbatim strings.
Try this
string regex= @"<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:""[^""]*""|'[^']*'|[\w\-.:]+))?)*\s*/?>\s*</\1\s*>";
精彩评论