开发者

Flattening an xml file using ref attributes

I'd like to flatten an xml data file (note that th开发者_开发问答is is not a schema, .xsd, file) programatically using C# (so an external xml editor won't work, unless it has an API). For an example tree structure:

<root>
    <A>
        <B att="val">
            <C>
                someData
            </C>
        </B>
    </A>
    <A>
        <B>
             someOtherData
        </B>
        <B>
            moreData
        </B>
    </A>
</root>

I'd like to flatten it to:

<root>
    <A>
        <B ref="b1" />
    </A>
    <A>
        <B ref="b2" />
        <B ref="b3" />
    </A>
    <B id="b1" att="val">
         <C ref="c1" />
    </B>
    <B id="b2">
        someOtherData
    </B>
    <B id="b3">
        moreData
    </B>
    <C id="c1">
         someData
    </C>
</root>

Is there a way to achieve this using C# ?

And is there a way to transform the flat xml back to tree structure? I'd like something as generic as possible, so any xml file could be flattened as such.

There is a similar question on so, but it doesn't deal with refs.


This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:variable name="Lower" select=
  "'abcdefghijklmnopqrstuvwxyz'"
  />

 <xsl:variable name="vUpper" select=
  "'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"
  />

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="/*">
  <root>
    <xsl:apply-templates select="node()"/>
    <xsl:apply-templates select="/*/*//*" mode="extract">
     <xsl:sort select="count(ancestor::*)" data-type="number"/>
    </xsl:apply-templates>
  </root>
 </xsl:template>

 <xsl:template match="*[ancestor::*[2]]">
   <xsl:variable name="vPos">
     <xsl:number level="any"/>
   </xsl:variable>

   <xsl:element name="{name()}">
     <xsl:attribute name="ref">
       <xsl:value-of select=
        "concat(translate(name(),$vUpper,$Lower),$vPos)"/>
     </xsl:attribute>
   </xsl:element>
 </xsl:template>

 <xsl:template match="*" mode="extract">
  <xsl:variable name="vPos">
   <xsl:number level="any"/>
  </xsl:variable>

  <xsl:element name="{name()}">
    <xsl:attribute name="id">
       <xsl:value-of select=
        "concat(translate(name(),$vUpper,$Lower),$vPos)"/>
    </xsl:attribute>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:element>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<root>
    <A>
        <B att="val">
            <C>
                someData
            </C>
        </B>
    </A>
    <A>
        <B>
             someOtherData
        </B>
        <B>
            moreData
        </B>
    </A>
</root>

produces exactly the wanted, correct result:

<root>
   <A>
      <B ref="b1"/>
   </A>
   <A>
      <B ref="b2"/>
      <B ref="b3"/>
   </A>
   <B id="b1" att="val">
      <C ref="c1"/>
   </B>
   <B id="b2">
             someOtherData
        </B>
   <B id="b3">
            moreData
        </B>
   <C id="c1">
                someData
            </C>
</root>

The reverse transformation is:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>
 <xsl:key name="kElbyId" match="*" use="@id"/>

 <xsl:template match="node()|@*">
   <xsl:copy>
    <xsl:apply-templates select="node()|@*"/>
   </xsl:copy>
 </xsl:template>

 <xsl:template match="*[@ref]">
  <xsl:apply-templates mode="deepen"
       select="key('kElbyId',@ref)"/>
 </xsl:template>

 <xsl:template match="*[@id]"/>
 <xsl:template match="*[@id]" mode="deepen">
  <xsl:copy>
   <xsl:apply-templates
        select="@*[not(name()='id')] | node()"/>
  </xsl:copy>
 </xsl:template>
</xsl:stylesheet>

when this, reverse transformation is applied on the result of the flattening transformation above, the initial XML document is produced:

<root>
   <A>
      <B att="val">
         <C>
                someData
            </C>
      </B>
   </A>
   <A>
      <B>
             someOtherData
        </B>
      <B>
            moreData
        </B>
   </A>
</root>


This stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:variable name="vUppercase" select="'QWERTYUIOPASDFGHJKLZXCVBNM'"/>
    <xsl:variable name="vLowercase" select="'qwertyuiopasdfghjklzxcvbnm'"/>
    <xsl:template match="*">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:if test="parent::*/parent::*">
                <xsl:attribute name="id">
                    <xsl:value-of select="translate(name(),
                                                    $vUppercase,
                                                    $vLowercase)"/>
                    <xsl:number level="any"/>
                </xsl:attribute>
            </xsl:if>
            <xsl:apply-templates mode="ref"/>
        </xsl:copy>
        <xsl:apply-templates/>
    </xsl:template>
    <xsl:template match="*" mode="ref">
        <xsl:copy>
            <xsl:attribute name="ref">
                <xsl:value-of select="translate(name(),
                                                $vUppercase,
                                                $vLowercase)"/>
                <xsl:number level="any"/>
            </xsl:attribute>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="text()"/>
    <xsl:template match="/*">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

Output:

<root>
    <A>
        <B ref="b1" />
    </A>
    <B att="val" id="b1">
        <C ref="c1" />
    </B>
    <C id="c1">
                    someData
    </C>
    <A>
        <B ref="b2" />
        <B ref="b3" />
    </A>
    <B id="b2">
                 someOtherData
    </B>
    <B id="b3">
                moreData
    </B>
</root>

The reverse stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:key name="kElementById" match="*[@id]" use="@id"/>
    <xsl:key name="kElementByRef" match="*[@ref]" use="@ref"/>
    <xsl:template match="node()|@*" name="identity">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="*[key('kElementByRef',@id)]|
                         *[key('kElementByRef',@id)]/@id"/>
    <xsl:template match="*[@ref]">
        <xsl:for-each select="key('kElementById',@ref)">
            <xsl:call-template name="identity"/>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

Output:

<root>
    <A>
        <B att="val">
            <C>
                someData
            </C>
        </B>
    </A>
    <A>
        <B>
             someOtherData
        </B>
        <B>
            moreData
        </B>
    </A>
</root>


You're probably better off going with @Alejandro's or @Dimitre's stylesheets, but I wanted to post mine since I finished a working version:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
   <xsl:output method="xml" indent="yes"/>
   <xsl:template match="/*">
      <xsl:copy>
         <!-- copy any non-elements -->
         <xsl:copy-of select="@* | node()[not(self::*)]"/>
         <!-- transform descendant elements -->
         <xsl:apply-templates select=".//*" mode="define" />
      </xsl:copy>
   </xsl:template>

   <xsl:template match="*" mode="define">
      <xsl:copy>
         <xsl:attribute name="id"><xsl:value-of select="generate-id()"/></xsl:attribute>
         <xsl:copy-of select="@*" />
         <xsl:apply-templates select="node()" />
      </xsl:copy>
   </xsl:template>

   <xsl:template match="*">
      <xsl:copy>
         <xsl:attribute name="ref"><xsl:value-of select="generate-id()"/></xsl:attribute>
      </xsl:copy>
   </xsl:template>

   <!-- Identity transform -->
   <xsl:template match="@* | node()" mode="ref">
      <xsl:copy>
         <xsl:apply-templates select="@* | node()"/>
      </xsl:copy>
   </xsl:template>

</xsl:stylesheet>

Note: I did not try to preserve

  • the particular id patterns you used. My assumption is that you don't care what the id's are, as long as they're unique and stable. If this assumption is incorrect, the previous two answers show how to generate the IDs according to your pattern.
  • the order in which you generated the element definitions, though the order of the original document should be recoverable from my output.
  • the fact that your top-level elements don't have id attributes. That would be an easy enough feature to add, as the other answers have done. But hopefully it is not necessary: top-level elements are identifiable as such because there are no references to them.

When running my stylesheet on your sample input, I get this output (the spacing is ugly but I'm not going to fix it since you have other good answers):

<?xml version="1.0" encoding="utf-8"?>
<root>



   <A id="d0e3">

      <B ref="d0e5"/>

   </A>
   <B id="d0e5" att="val">

      <C ref="d0e7"/>

   </B>
   <C id="d0e7">
            someData
         </C>
   <A id="d0e12">

      <B ref="d0e14"/>

      <B ref="d0e17"/>

   </A>
   <B id="d0e14">
         someOtherData
      </B>
   <B id="d0e17">
         moreData
      </B>
</root>
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜