开发者

How to find nodes with same children

I have following xml. Note that node n1 and n3 have same child开发者_C百科ren (order can be different). How can I write an XSL transformation to identify such nodes?

<Document>
    <Node name="n1">
        <Item value="v1">
        <Item value="v2">
        <Item value="v3">
    </Node>
    <Node name="n2">
        <Item value="p1">
        <Item value="p2">
        <Item value="p3">
    </Node>
    <Node name="n3">
        <Item value="v3">
        <Item value="v1">
        <Item value="v2">
    </Node>
</Document>


Here is an attempt to do it with XSLT 1.0:

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="1.0">

  <xsl:param name="sep" select="' '"/>

  <xsl:output indent="yes"/>

  <xsl:template match="Node">
    <Node name="{@name}">
      <xsl:attribute name="matches">
        <xsl:apply-templates 
          select="../Node[not(generate-id() = generate-id(current()))]
                         [count(Item) = count(current()/Item)]
                         [not(Item[not(@value = current()/Item/@value)])]"
           mode="check"/>
      </xsl:attribute>
    </Node>
  </xsl:template>

  <xsl:template match="Node" mode="check">
    <xsl:if test="position() &gt; 1">
      <xsl:value-of select="$sep"/>
    </xsl:if>
    <xsl:value-of select="@name"/>
  </xsl:template>

</xsl:stylesheet>

When running the stylesheet with Saxon 6.5.5 against the sample input

<Document>
    <Node name="n1">
        <Item value="v1"/>
        <Item value="v2"/>
        <Item value="v3"/>
    </Node>
    <Node name="n2">
        <Item value="p1"/>
        <Item value="p2"/>
        <Item value="p3"/>
    </Node>
    <Node name="n3">
        <Item value="v3"/>
        <Item value="v1"/>
        <Item value="v2"/>
    </Node>
    <Node name="n4">
        <Item value="p3"/>
        <Item value="v1"/>
        <Item value="v2"/>
    </Node>
    <Node name="n5">
        <Item value="v2"/>
        <Item value="v1"/>
        <Item value="v3"/>
    </Node> 
    <Node name="n6">
        <Item value="v2"/>
        <Item value="v1"/>
        <Item value="v3"/>
        <Item value="v4"/>
    </Node> 
    <Node name="n7">
        <Item value="v1"/>
        <Item value="v1"/>
        <Item value="v2"/>
        <Item value="v3"/>
        <Item value="v4"/>
    </Node> 
</Document>

I get the following result:

<Node name="n1" matches="n3 n5"/>

<Node name="n2" matches=""/>

<Node name="n3" matches="n1 n5"/>

<Node name="n4" matches=""/>

<Node name="n5" matches="n1 n3"/>

<Node name="n6" matches=""/>

<Node name="n7" matches=""/>


Here is a complete XSLT 1.0 solution that is general enough so that it would produce correct results even when a Node is allowed to have children with any name:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:ext="http://exslt.org/common" exclude-result-prefixes="ext">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:key name="kNodeBySign" match="Node" use="@signature"/>

 <xsl:template match="/*">
     <xsl:variable name="vrtfPass1">
      <xsl:apply-templates/>
     </xsl:variable>

     <xsl:apply-templates mode="pass2"
          select="ext:node-set($vrtfPass1)"/>
 </xsl:template>

 <xsl:template match="Node">
  <Node name="{@name}">
   <xsl:variable name="vSignature">
    <xsl:for-each select="*">
     <xsl:sort select="name()"/>
     <xsl:sort select="@value"/>
      <xsl:value-of select="concat(name(),'+++',@value)"/>
    </xsl:for-each>
  </xsl:variable>

  <xsl:attribute name="signature">
   <xsl:value-of select="$vSignature"/>
  </xsl:attribute>
  </Node>
 </xsl:template>

 <xsl:template match="/" mode="pass2">
  <xsl:for-each select=
    "Node[generate-id()
         =
          generate-id(key('kNodeBySign',@signature)[1])
         ]
    ">

    <Node name="{@name}">
      <xsl:variable name="vNodesInGroup">
        <xsl:for-each select=
          "key('kNodeBySign',@signature)[position()>1]">
          <xsl:value-of select="concat(@name, ' ')"/>
        </xsl:for-each>
      </xsl:variable>

      <xsl:attribute name="matches">
       <xsl:value-of select="$vNodesInGroup"/>
      </xsl:attribute>
    </Node>
  </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

when applied on this XML document:

<Document>
    <Node name="n1">
        <Item value="v1"/>
        <Item value="v2"/>
        <Item value="v3"/>
    </Node>
    <Node name="n2">
        <Item value="p1"/>
        <Item value="p2"/>
        <Item value="p3"/>
    </Node>
    <Node name="n3">
        <Item value="v3"/>
        <Item value="v1"/>
        <Item value="v2"/>
    </Node>
    <Node name="n4">
        <Item value="p3"/>
        <Item value="v1"/>
        <Item value="v2"/>
    </Node>
    <Node name="n5">
        <Item value="v2"/>
        <Item value="v1"/>
        <Item value="v3"/>
    </Node>
    <Node name="n6">
        <Item value="v2"/>
        <Item value="v1"/>
        <Item value="v3"/>
        <Item value="v4"/>
    </Node>
    <Node name="n7">
        <Item value="v1"/>
        <Item value="v1"/>
        <Item value="v2"/>
        <Item value="v3"/>
        <Item value="v4"/>
    </Node>
</Document>

the wanted, correct result is produced:

<Node name="n1" matches="n3 n5 "/>
<Node name="n2" matches=""/>
<Node name="n4" matches=""/>
<Node name="n6" matches=""/>
<Node name="n7" matches=""/>

Explanation:

  1. This is a two-pass transformation.

  2. The result of the first pass is an XML fragment containing Node elements with their name attribute and one newly added attribute: signature. This is the concatenation of the names and values of all children (in normal, sorted form). The result of pass1 in this concrete case is the following:

  3. In pass 2 we use the Muenchian method for grouping all Node elements by their signature attribute. The first Node in every group is represented in the output with a new attribute matches whose value is the space-delimited concatenation of the name attributes of the remaining Node elements in the current group.


[edit]The XSLT 2.0 stylesheet

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="2.0">

  <xsl:output indent="yes"/>

  <xsl:template match="Node">
    <Node name="{@name}" matches="{../Node[not(. is current())][every $item in current()/Item satisfies $item/@value = ./Item/@value]/@name}"/>
  </xsl:template>

</xsl:stylesheet>

when applied to

<Document>
    <Node name="n1">
        <Item value="v1"/>
        <Item value="v2"/>
        <Item value="v3"/>
    </Node>
    <Node name="n2">
        <Item value="p1"/>
        <Item value="p2"/>
        <Item value="p3"/>
    </Node>
    <Node name="n3">
        <Item value="v3"/>
        <Item value="v1"/>
        <Item value="v2"/>
    </Node>
    <Node name="n4">
        <Item value="p3"/>
        <Item value="v1"/>
        <Item value="v2"/>
    </Node>
    <Node name="n5">
        <Item value="v2"/>
        <Item value="v1"/>
        <Item value="v3"/>
    </Node>    
</Document>

outputs

<Node name="n1" matches="n3 n5"/>
<Node name="n2" matches=""/>
<Node name="n3" matches="n1 n5"/>
<Node name="n4" matches=""/>
<Node name="n5" matches="n1 n3"/>


If all the child Items have to match, then I think Martin's solution needs to be modified to use

<Node name="{@name}" 
      matches="{../Node[not(. is current())]
                        [count(Item) = count(current()/Item)]
                         [every $c in Item/@value 
                             satisfies 
                                $c = current()/Item/@value
                         ]
                          /@name
               }"/>

It's possible this doesn't meet the requirement exactly, for example it will match a Node with values v1,v2,v2 if there is another node with values v1,v2,v3. But since the requirement wasn't specified very precisely, I've had to guess a little.

(Note Martin's solution is XSLT 1.0, whereas mine is XSLT 2.0. I'm not going to the effort of writing XSLT 1.0 code unless people explicitly say that's what they need.)


This is my (HUMBLE) XSLT 1.0 approach. It's not as much elegant as others, and it might be a failure in many cases, but, according to the question, it does the job and it has also a certain degree of customization.

The check among the nodes is performed against a pattern built using a named template: build-pattern. For example, in your case I compare against a pattern built using all the value attributes of a node; that is, the first node is compared against a pattern like v1v2v3. The pattern is built upon elements with name Item. Obviously this pattern can be changed according to the requirements.

<xsl:stylesheet 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"    
    version="1.0">

    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="Node">

        <xsl:variable name="current">
            <xsl:call-template name="build-pattern">
                <xsl:with-param name="node" select="."/>
            </xsl:call-template>
        </xsl:variable>

        <xsl:copy>
            <xsl:copy-of select="@name"/>
            <xsl:attribute name="matches">

                <xsl:for-each select="../Node[not(generate-id()
                    = generate-id(current()))]">

                    <xsl:variable name="node">
                        <xsl:call-template name="build-pattern">
                            <xsl:with-param name="node" select="."/>
                        </xsl:call-template>
                    </xsl:variable>

                    <xsl:if test="$current=$node">
                        <xsl:value-of select="@name"/>
                    </xsl:if>

                </xsl:for-each>

            </xsl:attribute>
        </xsl:copy>

    </xsl:template>

    <xsl:template name="build-pattern">
        <xsl:param name="node"/>
        <xsl:for-each select="$node/Item">
            <xsl:sort select="@value"/>
            <xsl:value-of select="@value"/>
        </xsl:for-each>
    </xsl:template>

</xsl:stylesheet>

When applied on this input:

<Document>
    <Node name="n1">
        <Item value="v1"/>
        <Item value="v2"/>
        <Item value="v3"/>
    </Node>
    <Node name="n2">
        <Item value="p1"/>
        <Item value="p2"/>
        <Item value="p3"/>
    </Node>
    <Node name="n3">
        <Item value="v3"/>
        <Item value="v1"/>
        <Item value="v2"/>
    </Node>
    <Node name="n4">
        <Item value="p3"/>
        <Item value="v1"/>
        <Item value="v2"/>
    </Node>
    <Node name="n5">
        <Item value="v2"/>
        <Item value="v1"/>
        <Item value="v3"/>
    </Node> 
    <Node name="n6">
        <Item value="v2"/>
        <Item value="v1"/>
        <Item value="v3"/>
        <Item value="v4"/>
    </Node> 
    <Node name="n7">
        <Item value="v1"/>
        <Item value="v1"/>
        <Item value="v2"/>
        <Item value="v3"/>
        <Item value="v4"/>
    </Node> 
</Document>

Produces:

<Node name="n1" matches="n3n5"></Node>
<Node name="n2" matches=""></Node>
<Node name="n3" matches="n1n5"></Node>
<Node name="n4" matches=""></Node>
<Node name="n5" matches="n1n3"></Node> 
<Node name="n6" matches=""></Node> 
<Node name="n7" matches=""></Node>

I've generalized the above transform to find nodes with same children:

  • of any name
  • in any order
  • with any number of attribute and of any name

One would need only to replace the named template build-pattern with the following two:

<xsl:template name="build-pattern">
    <xsl:param name="node"/>
    <xsl:for-each select="$node/*">
        <xsl:sort select="name()"/>
        <xsl:sort select="@*[1]"/>
        <xsl:value-of select="name()"/>
        <xsl:apply-templates select="attribute::*">
        </xsl:apply-templates>
    </xsl:for-each>
</xsl:template>

<xsl:template match="@*">
    <xsl:value-of select="concat(name(),.)"/>
</xsl:template>

For example, when the new transform is applied to the following document:

<Document>
    <Node name="n1">
        <Item value="v1" x="a2"/>
        <foo value="v2" x="a1"/>
        <Item value="v3"/>
    </Node>
    <Node name="n2">
        <Item value="p1"/>
        <Item value="p2"/>
        <Item value="p3"/>
    </Node>
    <Node name="n3">
        <Item value="v3"/>
        <Item value="v1" x="a2"/>
        <foo value="v2" x="a1"/>
    </Node>
    <Node name="n4">
        <Item value="v3"/>
        <Item value="v1"/>
        <xxxx value="v2"/>
    </Node>
    <Node name="n5">
        <xxxx value="v2"/>
        <Item value="v1"/>
        <Item value="v3"/>
    </Node> 
    <Node name="n6">
        <Item value="v2"/>
        <Item value="v1"/>
        <Item value="v3"/>
        <Item value="v4"/>
    </Node> 
    <Node name="n7">
        <Item value="v1"/>
        <Item value="v1"/>
        <Item value="v2"/>
        <Item value="v3"/>
        <Item value="v4"/>
    </Node> 
</Document>

Produces:

<Node name="n1" matches="n3"/>
<Node name="n2" matches=""/>
<Node name="n3" matches="n1"/>
<Node name="n4" matches="n5"/>
<Node name="n5" matches="n4"/>
<Node name="n6" matches=""/>
<Node name="n7" matches=""/>
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜