splitting an XML file into multiple files using h1 and id
I'm an XSLT noob. I'm transforming an XML file into HTML. The resulting files will take the form of .inc files to be used as Server Side Includes. For now, I need to split the XML file at the h1 node and write it to multiple .inc files (containing everything between each h1 node) using the h1 id for the filename. The h1 id takes the form of a 'scriptLabel'. Right now, the document splits out ok - BUT simply writes the h1 itself and ignores the content after. What am I doing wrong?
Here's sample XML:
`<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE document SYSTEM "RRfront150610.dtd">
<document>
<section charstyle="No Style" pagenum="56" parastyle="Gov-Head-A"
scriptlabel="Gov-chairman-intro">
<h1 charstyle="No Style" pagenum="56" parastyle="Gov-Head-A"
开发者_开发问答scriptlabel="Gov-chairman-intro">chairman’s
introduction</h1>
<p charstyle="No Style" pagenum="56"
parastyle="Gov–Head-B-CI" scriptlabel="">
<strong charstyle="No Style" pagenum="56"
parastyle="Gov–Head-B-CI" scriptlabel="">Lorem ipsum
dolor sit amet, consectetur adipiscing elit. Morbi et leo
purus. Maecenas at metus massa. Donec rutrum tortor ac enim
tincidunt ut posuere purus aliquam.</strong>
</p>
<p charstyle="No Style" pagenum="56" parastyle="Gov-Body-CI"
scriptlabel="">Lorem ipsum dolor sit amet, consectetur
adipiscing elit. Morbi et leo purus. Maecenas at metus massa.
Donec rutrum tortor ac enim tincidunt ut posuere purus
aliquam.</p>
</section>
</document>`
Here's the XSLT to perform the split:
`<xsl:template match="/">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="document">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="h1">
<xsl:variable name="filename"
select="concat(@scriptlabel,'.inc')" />
<xsl:value-of select="$filename" />
<xsl:result-document href="{$filename}">
<xsl:copy>
<xsl:apply-templates select="@*|node()" />
</xsl:copy>
</xsl:result-document>
</xsl:template>`
In a short answer, this stylesheet:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="section">
<xsl:for-each-group select="node()" group-starting-with="h1">
<xsl:result-document href="{@scriptlabel}.inc">
<xsl:copy-of select="current-group()"/>
</xsl:result-document>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>
Serialize this Gov-chairman-intro.inc
:
<h1 charstyle="No Style"
pagenum="56"
parastyle="Gov-Head-A"
scriptlabel="Gov-chairman-intro"
>chairman’s introduction</h1>
<p charstyle="No Style"
pagenum="56"
parastyle="Gov–Head-B-CI"
scriptlabel="">
<strong charstyle="No Style"
pagenum="56"
parastyle="Gov–Head-B-CI"
scriptlabel=""
>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi et leo purus. Maecenas at metus massa. Donec rutrum tortor ac enim tincidunt ut posuere purus aliquam.</strong>
</p>
<p charstyle="No Style"
pagenum="56"
parastyle="Gov-Body-CI"
scriptlabel=""
>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi et leo purus. Maecenas at metus massa. Donec rutrum tortor ac enim tincidunt ut posuere purus aliquam.</p>
Note: Grouping section
children by a starting h1
. Copying the whole current group.
Update: Working with section
without h1
child and also not starting h1
group.
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="section">
<xsl:for-each-group select="*" group-adjacent="boolean(self::h1)">
<xsl:if test="not(current-grouping-key())">
<xsl:variable name="vMark" select="preceding-sibling::h1[1]"/>
<xsl:result-document
href="{((..|$vMark)/@scriptlabel)[last()]}.inc">
<xsl:copy-of select="current-group()|$vMark"/>
</xsl:result-document>
</xsl:if>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>
With this input:
<document>
<section charstyle="No Style" pagenum="56" parastyle="Gov-Head-A"
scriptlabel="Gov-chairman-intro">
<h1 charstyle="No Style" pagenum="56" parastyle="Gov-Head-A"
scriptlabel="Gov-chairman-intro">chairman’s
introduction</h1>
<p charstyle="No Style" pagenum="56"
parastyle="Gov–Head-B-CI" scriptlabel="">
<strong charstyle="No Style" pagenum="56"
parastyle="Gov–Head-B-CI" scriptlabel=""
>Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Morbi et leo purus. Maecenas at metus massa. Donec
rutrum tortor ac enim tincidunt ut posuere purus
aliquam.</strong>
</p>
<p charstyle="No Style" pagenum="56" parastyle="Gov-Body-CI"
scriptlabel="">Lorem ipsum dolor sit amet, consectetur
adipiscing elit. Morbi et leo purus. Maecenas at metus
massa. Donec rutrum tortor ac enim tincidunt ut posuere
purus aliquam.</p>
</section>
<section charstyle="No Style" pagenum="56" parastyle="Gov-Head-A"
scriptlabel="Test-no-H1">
<p charstyle="No Style" pagenum="56"
parastyle="Gov–Head-B-CI" scriptlabel="">
<strong charstyle="No Style" pagenum="56"
parastyle="Gov–Head-B-CI" scriptlabel=""
>Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Morbi et leo purus. Maecenas at metus massa. Donec
rutrum tortor ac enim tincidunt ut posuere purus
aliquam.</strong>
</p>
<p charstyle="No Style" pagenum="56" parastyle="Gov-Body-CI"
scriptlabel="">Lorem ipsum dolor sit amet, consectetur
adipiscing elit. Morbi et leo purus. Maecenas at metus
massa. Donec rutrum tortor ac enim tincidunt ut posuere
purus aliquam.</p>
</section>
</document>
Correctly serialize Gov-chairman-intro.inc
<h1 charstyle="No Style" pagenum="56" parastyle="Gov-Head-A" scriptlabel="Gov-chairman-intro">chairman’s
introduction</h1><p charstyle="No Style" pagenum="56" parastyle="Gov–Head-B-CI" scriptlabel=""><strong charstyle="No Style" pagenum="56" parastyle="Gov–Head-B-CI" scriptlabel="">Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Morbi et leo purus. Maecenas at metus massa. Donec
rutrum tortor ac enim tincidunt ut posuere purus
aliquam.</strong></p><p charstyle="No Style" pagenum="56" parastyle="Gov-Body-CI" scriptlabel="">Lorem ipsum dolor sit amet, consectetur
adipiscing elit. Morbi et leo purus. Maecenas at metus
massa. Donec rutrum tortor ac enim tincidunt ut posuere
purus aliquam.</p>
And Test-no-H1.inc
<p charstyle="No Style" pagenum="56" parastyle="Gov–Head-B-CI" scriptlabel=""><strong charstyle="No Style" pagenum="56" parastyle="Gov–Head-B-CI" scriptlabel="">Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Morbi et leo purus. Maecenas at metus massa. Donec
rutrum tortor ac enim tincidunt ut posuere purus
aliquam.</strong></p><p charstyle="No Style" pagenum="56" parastyle="Gov-Body-CI" scriptlabel="">Lorem ipsum dolor sit amet, consectetur
adipiscing elit. Morbi et leo purus. Maecenas at metus
massa. Donec rutrum tortor ac enim tincidunt ut posuere
purus aliquam.</p>
Note: Group adjacents by "Am I the mark?", copy group and preceding mark.
Your matching on "h1", so its only putting the h1 in the result document.
Can you re-organize your data so that you have...
<section>
<h1>Content 1</h1>
<p>...</p>
<p>...</p>
</section>
<section>
<h1>Content 2</h1>
<p>...</p>
<p>...</p>
</section>
You can rename the section tag to whatever you want, to not break existing code. Then your xslt will look like this
<xsl:template match="section">
<xsl:variable name="filename"
select="concat(@scriptlabel,'.inc')" />
<xsl:value-of select="$filename" />
<xsl:result-document href="{$filename}">
<xsl:copy-of select=" ./* " />
</xsl:result-document>
</xsl:template>
精彩评论