开发者

Sorting XML with XSLT - entire XML-schema is not known

I am wondering whether XSLT makes it possible to sort an XML file if I don't know the entire XML-schema.

For example I would like to sort the following XML file.

Sort /CATALOG/CD elements by /CATALOG/CD/TITLE

<CATALOG attrib1="value1">
  <DVD2>
    <TITLE>The Godfather2</TITLE>
  </DVD2>
  <CD>
    <TITLE>Hide your heart</TITLE>
    <ARTIST>Bonnie Tyler</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>CBS Records</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1988</YEAR>
  </CD>
  <CD attrib4="value4">
    <TITLE>Empire Burlesque</TITLE>
    <ARTIST>Bob Dylan</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>Columbia</COMPANY>
    <PRICE>
      <CATALOG>
        <CD><TITLE>E</TITLE></CD>
        <CD><TITLE>I</TITLE></CD>
        <CD><TITLE>D</TITLE></CD>
      </CATALOG>
    </PRICE>
    <YEAR>1985</YEAR>
  </CD>
  <CD attrib2="value2">
    <TITLE attrib3="value3">Greatest Hits</TITLE>
    <ARTIST>Dolly Parton</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>RCA</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1982</YEAR>
  </CD>
  <DVD>
    <TITLE>The Godfather1</TITLE&g开发者_如何学Got;
  </DVD>
</CATALOG>

The output should be:

<CATALOG attrib1="value1">
  <CD attrib4="value4">
    <TITLE>Empire Burlesque</TITLE>
    <ARTIST>Bob Dylan</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>Columbia</COMPANY>
    <PRICE>
      <CATALOG>
        <CD><TITLE>E</TITLE></CD>
        <CD><TITLE>I</TITLE></CD>
        <CD><TITLE>D</TITLE></CD>
      </CATALOG>
    </PRICE>
    <YEAR>1985</YEAR>
  </CD>
  <CD attrib2="value2">
    <TITLE attrib3="value3">Greatest Hits</TITLE>
    <ARTIST>Dolly Parton</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>RCA</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1982</YEAR>
  </CD>
  <CD>
    <TITLE>Hide your heart</TITLE>
    <ARTIST>Bonnie Tyler</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>CBS Records</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1988</YEAR>
  </CD>
  <DVD2>
    <TITLE>The Godfather2</TITLE>
  </DVD2>
  <DVD>
    <TITLE>The Godfather1</TITLE>
  </DVD>
</CATALOG>

The following is one of the many tries I did:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
    <!--<CATALOG>-->
    <xsl:for-each select="CATALOG/CD">
      <xsl:sort select="TITLE" />
      <xsl:copy-of select="."/>
    </xsl:for-each>
    <!--</CATALOG>-->
  </xsl:template>
</xsl:stylesheet>

The problem is that, with this XSLT, XML parts outside the CD list are not displayed.

I could uncomment the two commented-out parts of code, but that's exactly what I want to avoid.

In that case if any attributes are added to the CATALOG element, they would not be copied to output XML.

I don't want to re-build the XML file: I just want to do a sort knowing exact information only about some part of the XML-schema.

This functionality is easy to implement for example using .NET (with XmlDocument and XmlNode objects), or Python's lxmx library, but is it possible with XSLT?

Thanks!

Note: It is not easy to find a sample input XML which will avoid misunderstanding the question in all cases. But I will try to detail the problem as much as I can:

  • only CD elements right under CATALOG should be sorted (for example CD elements under the Bob Dylan section should be left untouched)
  • it is all the same whether elements other than CD (for example DVD and DVD2) are in the beginning or end of the list
  • no elements, attributes, values, comments, so nothing should be missing from the output XML
  • non-CD elements (for example DVD and DVD2) should not be sorted by the TITLE subelement


Is this a job for the identity transform? This can be used to copy XML whose schema is not known

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

I think all you need is to add a new template to match the CATALOG element, and then you can take some overriding action in this (in your case, to sort the CD elements)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:template match="@*|node()">
      <xsl:copy>
         <xsl:apply-templates select="@*|node()"/>
      </xsl:copy>
   </xsl:template>

   <xsl:template match="CATALOG">
      <xsl:copy>
         <xsl:apply-templates select="@*" />
         <xsl:apply-templates select="CD">
            <xsl:sort select="TITLE"/>
         </xsl:apply-templates>
         <xsl:apply-templates select="*[local-name() != 'CD']" />
      </xsl:copy>
   </xsl:template>
</xsl:stylesheet>

So, when matching CATALOG, you can still copy any attributes, and any non-CD children in the schema, without explicitly knowing their names. Note that if there are DVD elements until CATALOG for example, these will all be moved after the sorted CD elements in this case.


Keeping on the line of just modifying the identity transformation (which might not be really safe), I think that the following should be equivalent to @Tim's answer.

NOTE I'm not promoting this technique at all, unless you understand what's the general behavior of the identity transformation.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* 
                | node()[not(self::CD[parent::CATALOG])]"/>
            <xsl:apply-templates select="CD[parent::CATALOG]">
                <xsl:sort select="TITLE"/>
            </xsl:apply-templates>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

or, if you care about the other elements DVD and DVD2, you can do:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@*"/>
            <xsl:apply-templates select="CD[parent::CATALOG]">
                <xsl:sort select="TITLE"/>
            </xsl:apply-templates>
            <xsl:apply-templates select="node()
                [not(self::CD[parent::CATALOG])]"/>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>


To get all attributes in CATALOG element you can write:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>

    <xsl:template match="CATALOG">
        <xsl:copy>
            <xsl:copy-of select="@*"/>

            <xsl:copy-of select="CD[1]/preceding-sibling::*"/>
            <xsl:for-each select="CD">
                <xsl:sort select="TITLE"/>
                <xsl:copy-of select="."/>
            </xsl:for-each>
            <xsl:copy-of select="CD[last()]/following-sibling::*"/>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

Result:

<CATALOG atr1="value1" atr2="value2">
    <DVD>
        <FORMAT>DVD-9</FORMAT>
    </DVD>
    <CD>
        <TITLE>1999 Grammy Nominees</TITLE>
        <ARTIST>Many</ARTIST>
        <COUNTRY>USA</COUNTRY>
        <COMPANY>Grammy</COMPANY>
        <PRICE>10.20</PRICE>
        <YEAR>1999</YEAR>
    </CD>
    <CD>
        <TITLE>Big Willie style</TITLE>
        <ARTIST>Will Smith</ARTIST>
        <COUNTRY>USA</COUNTRY>
        <COMPANY>Columbia</COMPANY>
        <PRICE>9.90</PRICE>
        <YEAR>1997</YEAR>
    </CD>
    ...
    <BLUERAY>
        <TITLE>Contact</TITLE>
        <YEAR>1997</YEAR>
    </BLUERAY>
</CATALOG>


Try this:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
  <xsl:output method="xml" indent="yes" />

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="CATALOG">
    <xsl:copy>
      <xsl:apply-templates select="@* | *">
        <xsl:sort select="TITLE" />
      </xsl:apply-templates>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

This is the standard copy template match="@* | node()" plus a special case for the CATALOG node where a sort criteria is specified.

Note that I've made the selector for the apply-templates rule inside the secod template slightly different from the standard copy template (@* | *). This is because the standard copy template selector also includes text nodes, however text nodes have no TITLE element and so the sort directive ends up putting them all first which looks a bit weird (try it and see).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜