开发者

Regex gives compiler error

<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:"[^"]*"|'[^']*'|[\w\-.:]+))?)*\s*/?>\s*</\1\s*>

Above is the regex which I took from Remove all empty HTML tags? and I am trying to use the same below :-

string regex= @"<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:"[^"]*"|'[^']*'|[\w\-.:]+))?)*\s*/?>\s*</\1\s*>";

And I开发者_运维知识库 get many compile errors like - newline in constant, unrecognized escape sequence on the above line itself.

Could anybody help me by pointing what am I missing?


You need to use "" for double quotes inside the string:

string regex= @"<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:""[^""]*""|'[^']*'|[\w\-.:]+))?)*\s*/?>\s*</\1\s*>";


You have double quotes inside the regexp that need to be quoted.

 string regex= @"<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:"[^"]*"|'[^']*'|[\w\-.:]+))?)*\s*/?>\s*</\1\s*>";

should be

string regex= @"<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:\u0022[^\u0022]*\u0022|'[^']*'|[\w\-.:]+))?)*\s*/?>\s*</\1\s*>";

Btw, because of the <\/\1\s*> this will only remove balanced tags surrounding space. It will match <p> </p> but not <img src=bogus onerror=alert(1337)>.

Even if all you want to do is remove balanced tags around space, be aware that this will not match all such tags. Specifically, it will not match tags where the tag name varies by case: <p> </P>.

Finally, it will not remove transitively empty tags: <i><b></b></i> -> <i></i>.


Single double quotes( " ) have to be escaped with double double quotes ( "" ) in verbatim strings.

Try this

string regex= @"<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:""[^""]*""|'[^']*'|[\w\-.:]+))?)*\s*/?>\s*</\1\s*>";
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜