How to fix HTML with HTML Agility Pack
I have hundreds of ASPX files which I need to refactor a bit. I have several occurrences of this code:
<td style="text-align: right;">
<span class="frmFldLbl">Task (or some other text)</span>
</td>
and all the frmFldLbl
does is define a color and text size. So I want to change the above to this:
<td class="frmFldLbl">
Task (or some other text)
</td>
Much cleaner! And it will function the same because I'll stick the text-align: right;
on the frmFldLbl
class definition as well.
Right now, I'm only worried about getting this working for one file, then I'll add in the directory recursion and all that good stuff. I'm using the HTML Agility Pack to parse an HTML file, and I'm able to use XPATH to select the spans which I'm targeting for refactoring.
What I need to be able to do, and haven't figured out is how to insert text into the children of the <td>
in the correct spot. I would RTFM if I could find TFM, but it doesn't appear to be very well documented. Here's what I've come up with (it throws an exception). How do I insert the text in the correct spot?
Dim doc As New HtmlDocument()
doc.Load(fileName)
Dim culpritNodes As HtmlNodeCollection = doc.DocumentNode.SelectNodes("//td/span[@class='frmFldLbl']")
If culpritNodes IsNot Nothing Then
For Each culpritNode As HtmlNode In culpritNodes
Dim culpritNodeIndex As Int32 = culpritNode.ParentNode.ChildNodes.IndexOf(culpritNode)
Dim culpritNodeText As String = culpritNode.InnerHtml
Dim parentTdClassAtt As HtmlAttribute = culpritNode.ParentNode.Attributes("class")
If Not parentTdClassAtt.Value.开发者_开发技巧Contains("frmFldLbl") Then
If Not String.IsNullOrEmpty(parentTdClassAtt.Value) Then parentTdClassAtt.Value += " "
parentTdClassAtt.Value += "frmFldLbl"
End If
Dim replacementNode As New HtmlNode(HtmlNodeType.Text, doc, 0)
replacementNode.InnerHtml = culpritNodeText
culpritNode.ParentNode.ChildNodes.Insert(culpritNodeIndex, replacementNode)
culpritNode.Remove()
Next
End If
doc.Save(fileName)
ASPX files aren't HTML files. Using HTMLAgility pack to do this is probably not the best approach. Have you tested to see if <%...%>
expressions round-trip correctly through the HTMLAgility pack?
An easier approach would be to use the Replace feature in Visual Studio with a regular expression. Clicking 'replace' 100 times would be a lot easier than writing and debugging this code.
The RegEx will look something like:-
Find:
\<td style=:q\>\n:Wh*\<span class={:q}\>
Replace:
\<td class=\1\>
精彩评论