开发者

How do I grab text of multiple tags in an xml feed using one xpath expression?

I'm trying to parse an xml feed that looks something like this:

<item>
<title>item title</title>
<link>item link</link>
<description>item description</description>
</item>

I'm trying to find an xpath expression that will retrieve all the details of each item so that each item in the feed is contained within its own array or grouped in some way. I tried using //item/* but the tags are not grouped, although they are correctly ordered.

Is there anyway of doing that?

edit:

开发者_StackOverflow
[
[title1, link1, desc1],
[title2, link2, desc2],
[title3, link3, desc3]
]


From http://www.w3.org/TR/xpath/#section-Introduction

An expression is evaluated to yield an object, which has one of the following four basic types:

  • node-set (an unordered collection of nodes without duplicates)
  • boolean (true or false)
  • number (a floating-point number)
  • string (a sequence of UCS characters)

So, no "structure" data type like tuples. The "standar" solution for your task is to select the parents and iterate over them getting the children with any DOM API method.


With this input

<root>
<item>
    <title>item title</title>
    <link>item link</link>
    <description>item description</description>
</item>
<item>
    <title>item2</title>
    <link>link2</link>
    <description>description2</description>
</item>
</root>

And this xsl

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>

    <xsl:template match="//item">
        <xsl:value-of select="./title"/><xsl:text>
</xsl:text>
        <xsl:value-of select="./link"/><xsl:text>
</xsl:text>
        <xsl:value-of select="./description"/><xsl:text>
</xsl:text>
    </xsl:template>

</xsl:stylesheet>

You get this output

item title
item link
item description

item2
link2
description2

I hope this helped..


Here's an XPath 2.0 expression returning a sequence (assuming the XML input document from Stefanos' answer):

for $item in /root/item
  return ($item/title/text(), $item/link/text(), $item/description/text())

Sequences are ordered but do not allow nesting, so you cannot get exactly the kind of data structure you are asking for with pure XPath. With XSLT (or another host language), you can create new objects that provide the desired structure.


You haven't specified a language, but if you're using Python (which is what the data structure you presented looks like), it's easy enough to do using lxml:

 >>> from lxml import etree
 >>> d = etree.fromstring("""<doc>
 <item>
  <title>item 1 title</title>
  <link>item 1 link</link>
  <description>item 1 description</description>
 </item>
 <item>
  <title>item 2 title</title>
  <link>item 2 link</link>
  <description>item 2 description</description>
 </item>
</doc>""")
>>> [[e.xpath("title")[0].text,
      e.xpath("description")[0].text,
      e.xpath("link")[0].text]
     for e in d.xpath("/doc/item")]
[['item 1 title', 'item 1 description', 'item 1 link'], ['item 2 title', 'item 2 description', 'item 2 link']]

This isn't quite so easy to do in a list comprehension if the XML's structure is unreliable; the above breaks if there's an item element that doesn't have a 'link' child, for instance.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜