开发者

Remove style tags, CSS, scripts and HTML tags from HTML to plain text

Using regular expressions, how do I remove style tags, CSS, scrip开发者_开发问答ts and HTML tags from HTML to plain text.

In ASP.NET C#.


I don't think you are looking for a regex to do this, however the following regex should do it, if you run a regex replace:

<[^>]*>

To use this in a Regex Replace to the following:

string myHtmlString = "<html><body>my test text</body></html>";

string myPlainTextString = Regex.Replace(myHtmlString ,"<[^>]*>",String.Empty);

I recommend you use something like the Html Agility pack though - http://htmlagilitypack.codeplex.com/

as it has a method to make this even easier called "ConvertToPlainText":

string myHtmlString = "<html><body>my test text</body></html>";

string myPlainTextString = ConvertToPlainText(myHtmlString);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜