开发者

How to fix HTML with HTML Agility Pack

I have hundreds of ASPX files which I need to refactor a bit. I have several occurrences of this code:

<td style="text-align: right;">
  <span class="frmFldLbl">Task (or some other text)</span>
</td>

and all the frmFldLbl does is define a color and text size. So I want to change the above to this:

<td class="frmFldLbl">
  Task (or some other text)
</td>

Much cleaner! And it will function the same because I'll stick the text-align: right; on the frmFldLbl class definition as well.

Right now, I'm only worried about getting this working for one file, then I'll add in the directory recursion and all that good stuff. I'm using the HTML Agility Pack to parse an HTML file, and I'm able to use XPATH to select the spans which I'm targeting for refactoring.

What I need to be able to do, and haven't figured out is how to insert text into the children of the <td> in the correct spot. I would RTFM if I could find TFM, but it doesn't appear to be very well documented. Here's what I've come up with (it throws an exception). How do I insert the text in the correct spot?

    Dim doc As New HtmlDocument()
    doc.Load(fileName)
    Dim culpritNodes As HtmlNodeCollection = doc.DocumentNode.SelectNodes("//td/span[@class='frmFldLbl']")

    If culpritNodes IsNot Nothing Then
        For Each culpritNode As HtmlNode In culpritNodes

            Dim culpritNodeIndex As Int32 = culpritNode.ParentNode.ChildNodes.IndexOf(culpritNode)
            Dim culpritNodeText As String = culpritNode.InnerHtml
            Dim parentTdClassAtt As HtmlAttribute = culpritNode.ParentNode.Attributes("class")

            If Not parentTdClassAtt.Value.开发者_开发技巧Contains("frmFldLbl") Then

                If Not String.IsNullOrEmpty(parentTdClassAtt.Value) Then parentTdClassAtt.Value += " "
                parentTdClassAtt.Value += "frmFldLbl"

            End If

            Dim replacementNode As New HtmlNode(HtmlNodeType.Text, doc, 0)
            replacementNode.InnerHtml = culpritNodeText
            culpritNode.ParentNode.ChildNodes.Insert(culpritNodeIndex, replacementNode)
            culpritNode.Remove()

        Next
    End If

    doc.Save(fileName)


ASPX files aren't HTML files. Using HTMLAgility pack to do this is probably not the best approach. Have you tested to see if <%...%> expressions round-trip correctly through the HTMLAgility pack?

An easier approach would be to use the Replace feature in Visual Studio with a regular expression. Clicking 'replace' 100 times would be a lot easier than writing and debugging this code.

The RegEx will look something like:-

Find:

\<td style=:q\>\n:Wh*\<span class={:q}\>

Replace:

\<td class=\1\>
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜