How do I get the XML having No Root Node in Python

2023-04-04 14:19 问答作者：

Given the following data:

<rdf:RDF
    xmlns="http://purl.org/rss/1.0/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
    xmlns:dc="http://purl.org/dc/elements/1.
<channel rdf:about="http://www.gmanews.tv/">
        <title>GMANews.TV</title>
        <description> GMA News.tv bring you the latest news from GMA News teams and highlights of your favorite shows. Subscribe now and stay up-to-date with GMA News.tv.</description>
        <link>http://www.gmanews.tv/</link>
</channel>

<item rdf:about="http://www.gmanews.tv/story/232365/world/magnitude-59-quake-hits-chilean-coast-no-damage">
        <dc:format>text/html</dc:format>
        <dc:date>2011-09-14T16:39:22+08:00</dc:date>
        <dc:source>http://www.gmanews.tv/story/232365/world/magnitude-59-quake-hits-chilean-coast-no-damage </dc:source>
                <title><![CDATA[Magnitude-5.9 quake hits Chilean coast, no damage]]></title>
        <link>http://www.gmanews.tv/story/232365/world/magnitude-59-quake-hits-chilean-coast-no-damage </link>
        <description><![CDATA[SANTIAGO - A magnitude 5.9 quake hit just off the coast of central Chile early on Wednesday, but the state emergency office said there were no reports of damage.]]></description>
    </item>
        <item rdf:about="http://www.gmanews.tv/story/232362/nation/house-minority-blames-pnoys-advisers-for-legal-setbacks">
        <dc:format>text/html</dc:format>
        <dc:date>2011-09-14T16:04:51+08:00</dc:date>
        <dc:source>http://www.gmanews.tv/story/232362/nation/house-minority-blames-pnoys-advisers-for-legal-setbacks </dc:source>
                <title><![CDATA[House minority blames PNoy's advisers for legal 'setbacks']]></title>
        <link>http://www.gmanews.tv/story/232362/nation/house-minority-blames-pnoys-advisers-for-legal-setbacks </link>
        <description><![CDATA[Members of the opposition at the House of Representatives on Wednesday blamed President Benigno Aquino III's advisers for the various legal "setbacks&quot; suffered by his administration and advised him to consider replacing some of his advisers.]]></description>
    </item>
        <item rdf:about="http://www.gmanews.tv/story/232356/nation/ex-sharia-judge-20-others-may-testify-in-poll-fraud-probe">
        <dc:format>text/html</dc:format>
        <dc:date>2011-09-14T15:19:45+08:00</dc:date>
        <dc:source>http://www.gmanews.tv/story/232356/nation/ex-sharia-judge-20-others-may-testify-in-poll-fraud-probe </dc:source>
                <title><![CDATA[Ex-Shari'a judge, 20 others may testify in poll fraud probe]]></title>
        <link>http://www.gmanews.tv/story/232356/nation/ex-sharia-judge-20-others-may-testify-in-poll-fraud-probe </link>
        <description><![CDATA[The former Shari'a court judge who claimed t开发者_StackOverflow社区o have helped Gloria Macapagal-Arroyo cheat in the 2004 presidential elections and at least 20 others may serve as witnesses in the joint investigation by the Commission on Elections and Department of Justice on the alleged poll fraud, Comelec chief Sixto Brillantes Jr. said Wednesday.]]></description>
    </item>
</rdf:RDF>

Now I want to get the details of all the elements inside <item> tag . This is trivial but I am new to python . I am not quite sure how I am going to parse rdf and then extract all the <item> inside.

Edit: I can not use any third partie libraries as my script is going to run on embeded system.

lxml provides a nice way to handle all things XML. An example for the XML you posted:

from lxml import etree

document = etree.parse('your-example-xml.rdf')
root = document.getroot()

# Namespace shortcuts
ns = root.nsmap.get(None)
rdf = root.nsmap.get('rdf')

for item in root.xpath('purl:item', namespaces={'purl': ns}):
    print item.attrib.get('{%s}about' % rdf)
    print item.xpath('purl:description/text()', namespaces={'purl': ns})
    print

However, if it's only RDF you are parsing there might be RDF specific libraries available.

Since third party libraries are not an option, here's the same code done with Python's built-in ElementTree:

from xml.etree import ElementTree as etree

document = etree.parse(open('your-example-xml.rdf'))
root = document.getroot()

ns_purl = 'http://purl.org/rss/1.0/'
ns_rdf = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'

for item in root.findall('{%s}item' % ns_purl):
    print item.attrib.get('{%s}about' % ns_rdf)
    print item.find('{%s}description' % ns_purl).text
    print

继续阅读：python xml xml-parsing

How do I get the XML having No Root Node in Python

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？