开发者

Remove Encoded HTML from Strings using RegEx

I currently have an extension method from removing any HTML from strings.

Regex.Replace(s, @"<(.|\n)*?>", string.Empty);

This works fine on the whole, however, I am occasionally getting passed strings that have both standard HTML markup within them, along with encoded markup (I don't have control of the source data so can't c开发者_运维知识库orrect things at the point of entry), e.g.

&lt;p&gt;<p>Sample text</p>&lt;/p&gt;

I need an expression that will remove both encoded and non-encoded HTML (whether it be paragraph tags, anchor tags, formatting tags etc.) from a string.


I think you can do that in two passes with your same Extension method.

First Replace the usual un-encoded tags then Decode the returned string and do it again. Simple

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜