Removing element by class name with HTMLAgilityPack c#
I'm using the html agility pack to read the contents of my html document into a string etc. After this is done, I would like to remove certian elements in that content by their class, however I am stumbling upon a problem.
My Html looks like this:
<div id="wrapper">
<div class="maincolumn" >
<div class="breadCrumbContainer">
<div class="breadCrumbs">
</div>
</div>
<div class="seo_list">
<div class="seo_head">Header</div>
</div>
Content goes here...
</div>
Now, I have used an xpath selector to get all the content within the and used the InnerHtml property like so:
node = doc.DocumentNode.SelectSingleNode("//div[@id='wrapper']");
if (node != null)
{
pageContent = node.InnerHtml;
}
From this point, I would like to remove the div with the class of "breadCrumbContainer", however when using the code below, I get the error: "Node "" was not found in the collection"
node = doc.DocumentNode.SelectSingleNode("//div[@id='wrapper']");
node = node.RemoveChild(node.SelectS开发者_Python百科ingleNode("//div[@class='breadCrumbContainer']"));
if (node != null)
{
pageContent = node.InnerHtml;
}
Can anyone shed some light on this please? I'm quite new to Xpath, and really new to the HtmlAgility library.
Thanks,
Dave
It's because RemoveChild can only remove a direct child, not a grand child. Try this instead:
HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='breadCrumbContainer']");
node.ParentNode.RemoveChild(node);
This is a super-simple task for XSLT:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match=
"div[@class='breadCrumbContainer'
and
ancestor::div[@id='wrapper']
]
"/>
</xsl:stylesheet>
when this transformation is applied on the provided XML document (with added another <div>
and wrapped into an <html>
top element to make it more challenging and realistic):
<html>
<div id="wrapper">
<div class="maincolumn" >
<div class="breadCrumbContainer">
<div class="breadCrumbs"></div>
</div>
<div class="seo_list">
<div class="seo_head">Header</div>
</div> Content goes here...
</div>
</div>
<div>
Something else here
</div>
</html>
the wanted, correct result is produced:
<html>
<div id="wrapper">
<div class="maincolumn">
<div class="seo_list">
<div class="seo_head">Header</div>
</div> Content goes here...
</div>
</div>
<div>
Something else here
</div>
</html>
精彩评论