Selecting a child text node amongst white space text nodes, in a complex XML element using XPath
I've been racking my brain over this but can't seem to get it right, and I'm not hitting the correct keywords on Google..
I've recently started to play around with XSLT and XPath to create an XML description of natural language glossaries – for a project of mine.
The problem is that I have chosen to use 'mixed content' complex elements for some words and in some instances want to fetch just the text node.
Here's a portion of the XML document:
...
<entry category="substantiv">
<word lang="sv">semester</word>
<word lang="de">
<article>der</article>Urlaub
<plural>Urlaube</plural>
</word>
</entry>
...
There are many entry-elements in my document, and in this instance I want to fetch 'Urlaub' by using: /entry/word[@lang='de']/text()
, which because of my linebreaks wont work. I've discovered that there are actually three text nodes.. .../text()[2]
will work of course.. However, I don't know beforehand where there will be linebreaks, or how m开发者_高级运维any. If the xml is formated like the following, my first version of the path will work but not the second:
...
<word lang="de"><article>der</article>Urlaub
<plural>Urlaube</plural>
</word>
...
What I think I want to do is select all the immediate text nodes of word[@lang='de'], and then remove unnecessary white space using normalize-space()
. However, how do I do this using XPath? Or is there a better way? It seems like it would be easy but I can't figure it out. I am by the way trying to do this within an XSLT document.
normalize-space(/entry/word[@lang='de']/text()[*])
is one of the things I have tried, but that seems to do something else.
/Grateful for any help.
Update:
Here is part of the XSLT, as requested:
...
<xsl:choose>
<xsl:when test="@category='substantiv'">
<em><xsl:value-of select="word[@lang='de']/article" /></em>
<xsl:value-of select="normalize-space(word[@lang='de']/text()[2])" />
<em>pl. <xsl:value-of select="word[@lang='de']/plural" /></em>
</xsl:when>
...
This code works just fine with the first version of formating. To clarify, what I want to do is to grap the value of the text node in the complex element <word lang="de">
, despite however it might be formated with line breaks and white spaces. What I will do with the value depends on context, but right now I will just put it in an xhtml doc.
Update2:
I am now using <xsl:strip-space elements="*"/>
which eliminates the problem of having empty text nodes. I am also using:
...
<xsl:choose>
<xsl:when test="@category='substantiv'">
<em><xsl:value-of select="word[@lang='de']/article" /></em>
<xsl:text> </xsl:text>
<xsl:value-of select="normalize-space(word[@lang='de']/text())" />
<xsl:text>, </xsl:text>
<em>pl. <xsl:value-of select="word[@lang='de']/plural" /></em>
</xsl:when>
...
Still have to normalize though since a space is still added after "Urlaub" in the XML.
When I need to reach the text node "Urlaub" outside of the XSLT document I use:
<xsl:value-of select="normalize-space(word[@lang='de']/text()[normalize-space() != ''])" />
Thanks for all the help folks!
Update 3: Tried to improve the title
This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:value-of select="/*/entry/word[@lang='de']/text()[1]"/>
</xsl:template>
</xsl:stylesheet>
when applied on the provided XML document (wrapped in a dict
top element):
<dict>
<entry category="substantiv">
<word lang="sv">semester</word>
<word lang="de">
<article>der</article>Urlaub
<plural>Urlaube</plural>
</word>
</entry>
</dict>
produces exactly the wanted result:
Urlaub
Do note: the use of the <xsl:strip-space>
instruction to eliminate all white-space-only text nodes from the source XML document.
Therefore, no additional processing (normalize-space(), etc) is necessary.
Try:
/entry/word[@lang='de']/child::text()[normalize-space(.) != '']
Meaning, grab all child text nodes but not those that normalize to an empty string.
-Oisin
I think this is the skeleton of what you want, minus any normalize-space() to get things to look exactly the way you want.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/">
<xsl:apply-templates select=".//word"/>
</xsl:template>
<xsl:template match="word">
<xsl:apply-templates select=".//text()"/>
</xsl:template>
<xsl:template match="text()"><xsl:value-of select="."/><xsl:text> </xsl:text></xsl:template>
</xsl:stylesheet>
The key is the .//text()
which returns the concatenation of ALL child text nodes at any nesting level below the context node().
Now that I see your code I recommend this:
<xsl:choose>
<xsl:when test="@category='substantiv'">
<em><xsl:value-of select="word[@lang='de']/article" /></em>^
<!-- select the first non-empty text node and normalize it -->
<xsl:value-of select="normalize-space(word[@lang='de']/text()[normalize-space() != ''][1])" />
<em>pl. <xsl:value-of select="word[@lang='de']/plural" /></em>
</xsl:when>
Original Version of the answer
To get you started:
<entry category="substantiv">
<word lang="sv">semester</word>
<word lang="de">
<article>der</article>Urlaub
<plural>Urlaube</plural>
</word>
</entry>
When passed through this XSLT 1.0:
<!-- identity template copies everything 1:1, unless other templates apply -->
<xsl:template match="*|@*">
<xsl:copy>
<xsl:apply-templates select="*|@*" />
</xsl:copy>
</xsl:template>
<!-- empty template: ignore every white-space-only text-node child of <word> -->
<xsl:template match="word/text()[normalize-space() = '']" />
Would produce this:
<entry category="substantiv">
<word lang="sv">semester</word>
<word lang="de"><article>der</article>Urlaub<plural>Urlaube</plural></word>
</entry>
This answer is a guess and may not be exactly what you are after. Your question needs clarification in any case. Not always is what you think you want the same as what you actually want.
精彩评论