Tricky issue with using xslt with badly formed html
I am fairly new to xslt (2.0) and am having some trouble with a tricky issue. Essentially I have a badly formatted html file like below:
<html>
<body>
<p> text 1 </p>
<div> <p> text 2</p> </div>
<p> Here is a list
<ul>
<ol>
<li> ListItem1 </li>
<li> ListItem1 </li>
</ol>
<dl>
<li> dl item </li>
<li> dl item2 </li>
</dl>
</ul>
<div>
<p> I was here</p>
</div>
</p>
</body>
</html>
And I am trying to put it into a nicely formated XML file. In my xslt file I recursively check if all children of a p or div are other p's or div's and just promote them, other wise I use them as stand alone paragraphs. I extended this idea so tha开发者_高级运维t if a p or div with a child list show up properly but don't promote the list children.
A problem that I am having is that the output xml I get is the following
<?xml version="1.0" encoding="utf-8"?><html>
<body>
<p> text 1 </p>
<p> text 2</p>
Here is a list
<ul>
<ol>
<li> ListItem1 </li>
<li> ListItem1 </li>
</ol>
<dl>
<li> dl item </li>
<li> dl item2 </li>
</dl>
</ul>
<p> I was here</p>
</body>
</html>
"Here is a list" needs to be in paragraph tags too! I am going crazy trying to solve this ... Any input/links would be greatly appreciated.
This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match=
"div[descendant::div or descendant::p]
|
p[descendant::div or descendant::p]
">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match=
"div[descendant::div or descendant::p]/text()
|
p[descendant::div or descendant::p]/text()
">
<xsl:element name="{name(..)}"
namespace="{namespace-uri(..)}">
<xsl:copy-of select="."/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
when applied on the provided XML document, produces the wanted, correct output:
<html>
<body>
<p> text 1 </p>
<p> text 2</p>
<p> Here is a list
</p>
<ul>
<ol>
<li> ListItem1 </li>
<li> ListItem1 </li>
</ol>
<dl>
<li> dl item </li>
<li> dl item2 </li>
</dl>
</ul>
<p> I was here</p>
</body>
</html>
精彩评论