Stripping MS Word Tags Using Html Agility Pack
I have a DB with some text fields pasted from MS Word, and I'm having trouble to strip just the , and tags, but obviously keeping their innerText.
I've tried using the HAP but I'm not going in the right direction..
Public Function StripHtml(ByVal html As String, ByVal allowHarmlessTags As Boolean) As String
Dim htmlDoc As New HtmlDocument()
htmlDoc.LoadHtml(html)
Dim invalidNodes As HtmlNodeCollection = htmlDoc.DocumentNode.SelectNodes("//div|//font|//span")
For Each node In invalidNodes
node.ParentNode.RemoveChild(node, False)
Next
Return htmlDoc.DocumentNode.WriteTo()
End Function
This code simply selects the desired elements and removes them... but not keeping their inner text..
Thanks in adva开发者_开发技巧nce
Well... I think I found a solution:
Public Function StripHtml(ByVal html As String) As String
Dim htmlDoc As New HtmlDocument()
htmlDoc.LoadHtml(html)
Dim invalidNodes As HtmlNodeCollection = htmlDoc.DocumentNode.SelectNodes("//div|//font|//span|//p")
For Each node In invalidNodes
node.ParentNode.RemoveChild(node, True)
Next
Return htmlDoc.DocumentNode.WriteContentTo
End Function
I was almost there... :P
精彩评论