开发者

VB.Net or C# to strip html but leave less than or greater than

I have a string variable that contains the following html data:

<p> <em><strong>This is some <span style="background-color: rgb(255, 255, 0);">rich </span>text. 3 < 5 is a valid statement. <br /> </strong></em></p>

I need to be able to strip out the html, but leave any less than or greater than signs in case the data contains mathematical equations (like the "3 < 5" portion of the string). I am not able to use 3rd party applications/tools due to some restrictions of our site, and would prefer to use anything that is in the .net framework version 开发者_运维问答3.5. I have tried the regular expressions that follow, but they do not handle the less than/ greater than symbols.

<[^>]*>

<[^>]+>

<(.|\n)*?>

\<[^\>]*\>

I have also tried the code on this link, but it also does not handle the less than / greater than symbols either.

Any suggestions are greatly appreciated.


Replace all text matching this with ''

(<[^<>]*>)+

(I tested it on Rubular.com, but it should work for C# too.)

Apparently the code should be

RegexObj.Replace('<p> <em><strong>This is some <span style="background-color: rgb(255, 255, 0);">rich </span>text. 3 < 5 is a valid statement. <br /> </strong></em></p>', "")
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜