开发者

remove all HTML formatting from a string

I am 开发者_运维百科trying to compare 2 strings but i just realized that one has some html formatting already.

How can i get these two strings to match when doing string1 == string2. (NOTE: i dont know what the HTML formatting is going to be upfront)

string1 = "This is a test";
string1 = "<font color=\"black\" size=\"1\">This is a test</font>";


Load the html into Html Agility Pack, and extract only the text.

string html = "<html><body><div>test</div></body></html>";
HtmlDocument document = new HtmlDocument();
document.LoadHtml(html)
string text = document.DocumentNode.InnerText;

This will not remove the content of <script> nodes, but you can easily remove the script nodes first.


string newText = System.Text.RegularExpressions.Regex.Replace(OldHtmlTextHere, "<[^>]*>", string.Empty);


Check out system.web.Httputility.HTMLdecode

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜