开发者

ElementTree namespace incovenience

I can't control quality of XML that I get. In some cases it is:

<COLLADA xmlns="http://www.collada.org/2005/11/COLLADASchema" version="1.4.1">
...
</COLLADA>

in others I get:

 <COLLADA>...</COLLAD开发者_StackOverflowA>

and I guess I should also handle

 <collada:COLLADA xmlns:collada="http://www.collada.org/2005/11/COLLADASchema">
 ...
 </collada:COLLADA>

It's the same schema all over, and I only need one parser to process it. How can I handle all these cases? I need XPath and other lxml goodies to get through this. How do I make it consistent during etree.parse time? I don't want to check on namespaces every time I need to use XPath.


My usual recommendation is to preprocess it first, to normalize the namespaces. This has two benefits: the normalization code is highly reusable, because it doesn't depend on how the data is being processed subsequently; and the logic to process the data is considerably simplified.

If the documents only use this one namespace, or none, and do not use qualified names in the content of text or attribute nodes, then the transformation to achieve this normalization is very simple:

<xsl:template match="*">
  <xsl:element name="local-name()" namespace="http://www.collada.org/2005/11/COLLADASchema">
    <xsl:copy-of select="@*"/>
    <xsl:apply-templates/>
  </xsl:element>
</xsl:template>
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜