remove all HTML formatting from a string
I am 开发者_运维百科trying to compare 2 strings but i just realized that one has some html formatting already.
How can i get these two strings to match when doing string1 == string2. (NOTE: i dont know what the HTML formatting is going to be upfront)
string1 = "This is a test";
string1 = "<font color=\"black\" size=\"1\">This is a test</font>";
Load the html into Html Agility Pack, and extract only the text.
string html = "<html><body><div>test</div></body></html>";
HtmlDocument document = new HtmlDocument();
document.LoadHtml(html)
string text = document.DocumentNode.InnerText;
This will not remove the content of <script>
nodes, but you can easily remove the script nodes first.
string newText = System.Text.RegularExpressions.Regex.Replace(OldHtmlTextHere, "<[^>]*>", string.Empty);
Check out system.web.Httputility.HTMLdecode
精彩评论