开发者

Stripping MS Word Tags Using Html Agility Pack

I have a DB with some text fields pasted from MS Word, and I'm having trouble to strip just the , and tags, but obviously keeping their innerText.

I've tried using the HAP but I'm not going in the right direction..

Public Function StripHtml(ByVal html As String, ByVal allowHarmlessTags As Boolean) As String
    Dim htmlDoc As New HtmlDocument()
    htmlDoc.LoadHtml(html)
    Dim invalidNodes As HtmlNodeCollection = htmlDoc.DocumentNode.SelectNodes("//div|//font|//span")
    For Each node In invalidNodes
        node.ParentNode.RemoveChild(node, False)
    Next
    Return htmlDoc.DocumentNode.WriteTo()
End Function

This code simply selects the desired elements and removes them... but not keeping their inner text..

Thanks in adva开发者_开发技巧nce


Well... I think I found a solution:

Public Function StripHtml(ByVal html As String) As String
    Dim htmlDoc As New HtmlDocument()
    htmlDoc.LoadHtml(html)
    Dim invalidNodes As HtmlNodeCollection = htmlDoc.DocumentNode.SelectNodes("//div|//font|//span|//p")
    For Each node In invalidNodes
        node.ParentNode.RemoveChild(node, True)
    Next
    Return htmlDoc.DocumentNode.WriteContentTo
End Function

I was almost there... :P

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜