开发者

HTML to CALS tables?

I'm checking to see if anyone has an XSLT laying around that transforms HTML tables to CALS. I开发者_C百科've found a lot of material on going the other way (CALS to HTML), but not from HTML. I thought somebody may have done this before so I don't have to reinvent the wheel. I'm not looking for a complete solution. Just a starting point.

If I get far enough on my own, I'll post it for future reference.


I've come up with a much simpler solution than what @Flack linked to:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="tbody">
    <xsl:variable name="maxColumns">
        <xsl:for-each select="tr">
            <xsl:sort select="sum(td/@colspan) + count(td[not(@colspan)])" data-type="number"/>
            <xsl:if test="position() = last()">
                <xsl:value-of select="sum(td/@colspan) + count(td[not(@colspan)])"/>
            </xsl:if>
        </xsl:for-each>
    </xsl:variable>
    <tgroup>
        <xsl:attribute name="cols">
            <xsl:value-of select="$maxColumns"/>
        </xsl:attribute>
        <xsl:apply-templates select="@*|node()"/>
    </tgroup>
</xsl:template>

<xsl:template match="td[@colspan > 1]">
    <entry>
        <xsl:attribute name="namest">
            <xsl:value-of select="sum(preceding-sibling::td/@colspan) + count(preceding-sibling::td[not(@colspan)]) + 1"/>
        </xsl:attribute>
        <xsl:attribute name="nameend">
            <xsl:value-of select="sum(preceding-sibling::td/@colspan) + count(preceding-sibling::td[not(@colspan)]) + @colspan"/>
        </xsl:attribute>
        <xsl:apply-templates select="@*[name() != 'colspan']|node()"/>
    </entry>
</xsl:template>

<xsl:template match="tr">
    <row>
        <xsl:apply-templates select="@*|node()"/>
    </row>
</xsl:template>

<xsl:template match="td">
    <entry>
        <xsl:apply-templates select="@*|node()"/>
    </entry>
</xsl:template>

<xsl:template match="td/@rowspan">
    <xsl:attribute name="morerows">
        <xsl:value-of select=". - 1"/>
    </xsl:attribute>
</xsl:template>

<!-- fallback rule -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>
</xsl:stylesheet>

There are two tricky points. First, a CALS table needs a tgroup/@cols attribute containing the number of columns. So we need to find the maximum number of cells in one row in the XHTML table - but we must heed colspan declarations so that a cell with colspan > 1 creates the right number of columns! The first template in my stylesheet does just that, based on @Tim C's answer to the max cells per row problem.

Another problem is that for multi-column cells XHTML says "this cell is 3 columns wide" (colspan="3") while CALS will say "this cell starts in column 2 and ends in column 4" (namest="2" nameend="4"). That transformation is done in the second template in the stylesheet.

The rest is indeed fairly straightforward. The stylesheet doesn't deal with details like changing style="width: 50%" into width="50%" etc. but those are relatively common problems, I believe.


I know it's 4 years later, but posting for someone who may come across:

ISOSTS XHTML table to CALS conversion


I know that this is a late answer, but I'm currently developing a Python library to easily convert tables from a XML format to another.

To convert the tables of a .docx document to CALS format, you can process as follow:

import os
import zipfile

from benker.converters.ooxml2cals import convert_ooxml2cals

# - Unzip the ``.docx`` in a temporary directory
src_zip = "/path/to/demo.docx"
tmp_dir = "/path/to/tmp/dir/"
with zipfile.ZipFile(src_zip) as zf:
    zf.extractall(tmp_dir)

# - Source paths
src_xml = os.path.join(tmp_dir, "word/document.xml")
styles_xml = os.path.join(tmp_dir, "word/styles.xml")

# - Destination path
dst_xml = "/path/to/demo.xml"

# - Create some options and convert tables
options = {
    'encoding': 'utf-8',
    'styles_path': styles_xml,
    'width_unit': "mm",
    'table_in_tgroup': True,
}
convert_ooxml2cals(src_xml, dst_xml, **options)

See: https://benker.readthedocs.io

note: (X)HTML format will come soon (contributions are welcome).


Though I don't the understand the particular difficulty, I googled some:

  • Stylesheet for converting XHTML tables to CALS
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜