开发者

How do I get rid of incorrect xmlns attributes in HTML transformed with XSLT

I am trying to transform a .html document using xslt. The generated html for some reason has an extra xmlns attribute on the head element, and an empty xmlns attribute on the title element.

example.html:

<!DOCTYPE html>
<html xmlns="http:/开发者_运维百科/www.w3.org/1999/xhtml">
  <head><title>foo</title></head>
  <body><h1>bar</h1><img src="baz.jpg" /></body>
</html>

template.xsl:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xmlns="http://www.w3.org/1999/xhtml">

  <xsl:output doctype-system="about:legacy-compat" method="html"
     omit-xml-declaration="yes" />

  <xsl:template match="/html/head">
    <head>
      <meta name="description" content="something added to the head element"/>
      <xsl:apply-templates select="./@*|./node()" />
    </head>
  </xsl:template>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

I've been testing the transformation with xsltproc and with php.

Running xsltproc:

$ xsltproc -html template.xsl example.html 
<!DOCTYPE html SYSTEM "about:legacy-compat">
<html xmlns="http://www.w3.org/1999/xhtml">
<head xmlns="http://www.w3.org/1999/xhtml"><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="description" content="something added to the head element"></meta><title xmlns="">foo</title></head><body>
<h1>bar</h1>
<img src="baz.jpg">
</body>
</html>

Using PHP:

<?php

$xmldoc = new DomDocument ();
$xmldoc->loadHTMLFile ("example.html");

$xsldoc = new DomDocument ();
$xsldoc->load ("template.xsl");

$xslt = new XSLTProcessor();
$xslt->importStylesheet($xsldoc);

echo $xslt->transformToXML ($xmldoc);

I would expect all the elements in the source document to be in the html namespace, so I don't understand why apply-templates seemingly removes the namespace from the title element. I also don't understand why the html namespace is added to the head element.


The http://www.w3.org/1999/xhtml namespace is for XHTML. Therefore you should set the output mode to xml instead of html and also output the correct doctype for XHTML, or instead render as html and not use any namespace at all.

Note that XSLT is not quite suited for generating HTML5, but it is perfect for generating HTML 4 or XHTML if you pay some attention to details (such as which elements must or must not be empty etc.).


In this specific case you should use a kind of identity transformation modified to remove default namespaces:

<xsl:template match="@*|node()[not(self::*)]">
  <xsl:copy/>
 </xsl:template>

 <xsl:template match="*">
  <xsl:element name="{local-name()}">
   <xsl:apply-templates select="node()|@*"/>
  </xsl:element>
 </xsl:template>

Obviously make sure to remove this line from your XSLT:

xmlns="http://www.w3.org/1999/xhtml"

Your final template:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:output doctype-system="about:legacy-compat" method="html"
        omit-xml-declaration="yes" />

    <xsl:template match="/html/head">
        <head>
            <meta name="description" content="something added to the head element"/>
            <xsl:apply-templates select="./@*|./node()" />
        </head>
    </xsl:template>

    <xsl:template match="@*|node()[not(self::*)]">
        <xsl:copy/>
    </xsl:template>

    <xsl:template match="*">
        <xsl:element name="{local-name()}">
            <xsl:apply-templates select="node()|@*"/>
        </xsl:element>
    </xsl:template>

</xsl:stylesheet>


I can't explain or reproduce your results.

Firstly, your template with match="/html/head" should not match anything in your input document, because your /html/head elements are in a namespace.

With Saxon the output I get is this, which I believe to be correct:

<!DOCTYPE html
  SYSTEM "about:legacy-compat">
<html xmlns="http://www.w3.org/1999/xhtml">  
   <head>
      <title>foo</title>
   </head>   
   <body>
      <h1>bar</h1><img src="baz.jpg"></img></body>  
</html>

So either you're doing something different from what you say (e.g. using a different stylesheet or a different source document from the one shown) or there's a bug in the XSLT processor you are using.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜