Filter XML based on child nodes
I have an XML file similar to this (with more nodes and details removed):
<?xml version="1.0" encoding="utf-8"?>
<Message xmlns="http://www.theia.org.uk/ILR/2011-12/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Header>
<CollectionDetails>
<Collection>ILR</Collection>
<Year>1112</Year>
<FilePreparationDate>2011-10-06</FilePreparationDate>
</CollectionDetails>
<Source>
<ProtectiveMarking>PROTECT-PRIVATE</ProtectiveMarking>
</Source>
</Header>
<SourceFiles>
<SourceFile>
<SourceFileName>A10004705001112004401.ER</SourceFileName>
<FilePreparationDate>2011-10-05</FilePreparationDate>
</SourceFile>
</SourceFiles>
<LearningProvider>
<UKPRN>10004705</UKPRN>
<UPIN>107949</UPIN>
</LearningProvider>
<Learner>
<ULN>4682272097</ULN>
<GivenNames>Peter</GivenNames>
<LearningDelivery>
<LearnAimRef>60000776</LearnAimRef>
</LearningDelivery>
<LearningDelivery>
<LearnAimRef>ZPROG001</LearnAimRef>
</LearningDelivery>
</Learner>
&开发者_高级运维lt;Learner>
<ULN>3072094321</ULN>
<GivenNames>Thomas</GivenNames>
<LearningDelivery>
<LearnAimRef>10055320</LearnAimRef>
</LearningDelivery>
<LearningDelivery>
<LearnAimRef>10002856</LearnAimRef>
</LearningDelivery>
<LearningDelivery>
<LearnAimRef>1000287X</LearnAimRef>
</LearningDelivery>
</Learner>
</Message>
I need to filter this so that only Learner records that have a child LearningDelivery LearnAimRef of ZPROG001 will show so the output in this case would be the first learner but not the second:
<?xml version="1.0" encoding="utf-8"?>
<Message xmlns="http://www.theia.org.uk/ILR/2011-12/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Header>
<CollectionDetails>
<Collection>ILR</Collection>
<Year>1112</Year>
<FilePreparationDate>2011-10-06</FilePreparationDate>
</CollectionDetails>
<Source>
<ProtectiveMarking>PROTECT-PRIVATE</ProtectiveMarking>
</Source>
</Header>
<SourceFiles>
<SourceFile>
<SourceFileName>A10004705001112004401.ER</SourceFileName>
<FilePreparationDate>2011-10-05</FilePreparationDate>
</SourceFile>
</SourceFiles>
<LearningProvider>
<UKPRN>10004705</UKPRN>
<UPIN>107949</UPIN>
</LearningProvider>
<Learner>
<ULN>4682272097</ULN>
<GivenNames>Peter</GivenNames>
<LearningDelivery>
<LearnAimRef>60000776</LearnAimRef>
</LearningDelivery>
<LearningDelivery>
<LearnAimRef>ZPROG001</LearnAimRef>
</LearningDelivery>
</Learner>
</Message>
I have looked into how to do this and believe the correct way to do this is to use an XSL transform to process the xml and output as needed to a new file (Doing this in c#). After a couple of hours trying to wrap my head around the XSLT syntax I am still stuck and can't get the output I want. Any help much appreciated.
To copy most of an XML source document, modifying only certain parts, you'll want to start with an identity transform. This just copies everything. Then add a template to override the identity template for <Learner>
elements you don't want to copy:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:theia="http://www.theia.org.uk/ILR/2011-12/1">
<!-- identity template -->
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- override the above template for certain Learner elements; output nothing. -->
<xsl:template match="theia:Learner[
not(theia:LearningDelivery/theia:LearnAimRef = 'ZPROG001')]">
</xsl:template>
</xsl:stylesheet>
(borrowing namespace prefix from @andyb).
If you just want all the <Learner>
elements that have a descendent (in this case LearnAimRef) with a particular value then you can use a predicate expression (the bit between the [
and ]
) to filter the node-set.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:theia="http://www.theia.org.uk/ILR/2011-12/1">
<xsl:template match="/theia:Message">
<xsl:copy-of select="theia:Learner[theia:LearningDelivery/theia:LearnAimRef='ZPROG001']"/>
</xsl:template>
</xsl:stylesheet>
So the copy-of
reads as copy all the Learner nodes, that have a child called LearningDelivery which has a child called LearnAimRef that has a value equal to ZPROG001
Your XML document has a default namespace of "http://www.theia.org.uk/ILR/2011-12/1" so in order for the XPath to correctly select a node, it has to use the same namespace declaration, so in the above XSLT, I have assigned your namespace to an alias and used that in the XPath.
If you want other parts of the XML source copying to the output tree, you could add further rules for example <xsl:copy-of select="theia:LearningProvider"/>
This is not an answer for applying the transformation in C#, however that has been answered already - How to apply an XSLT Stylesheet in C#
Hope this helps :)
精彩评论