How to fetch values in quotes using python

2023-02-14 04:11 问答作者：

I have an XML file and I parsed in the d开发者_运维问答ata of the xml file to get a list as below:

humidity data="Humidity: 73%" icon data="/ig/images/weather/cloudy.gif" wind_condition data="Wind: N at 5 mph"

I want to write a python code where I can capture only the values in quotes and put it in a list.

The following code shows how to parse XML using a proper XML parser. The xml stream is reconstructed from the partial information that you have supplied.

xml_strg = """<?xml version="1.0"?>
<xml_api_reply version="1">
    <weather module_id="0" tab_id="0" mobile_row="0" mobile_zipped="1" row="0" section="0" >
        <forecast_information>
            <city data="Baton Rouge, LA"/>
            <postal_code data="baton rouge,la"/>
            <latitude_e6 data=""/>
            <longitude_e6 data=""/>
            <forecast_date data="2011-02-22"/>
            <current_date_time data="2011-02-22 20:06:59 +0000"/>
            <unit_system data="US"/>
        </forecast_information>
        <current_conditions>
            <condition data="Cloudy"/>
            <temp_f data="72"/>
            <temp_c data="22"/>
            <humidity data="Humidity: 73%"/>
            <icon data="/ig/images/weather/cloudy.gif"/>
            <wind_condition data="Wind: N at 5 mph"/>
        </current_conditions>
    </weather>
</xml_api_reply>
"""        

import xml.etree.cElementTree as et

root =  et.fromstring(xml_strg)
result = []
for elem in root.find('./weather/current_conditions'):
    if elem.tag in ('humidity', 'icon', 'wind_condition'):
        result.append(elem.get('data'))
print result

Output:

['Humidity: 73%', '/ig/images/weather/cloudy.gif', 'Wind: N at 5 mph']

What you show above isn't really a list, so we need to know how your data object really looks. For instance, if you have your example in a single string, like:

'humidity data="Humidity: 73%" icon data="/ig/images/weather/cloudy.gif" wind_condition data="Wind: N at 5 mph"'

You can parse this string to get all quoted parts in a list as follows:

import re
re.findall('\"(.+?)\"', in_string)

This uses non-greedy matching to find all substrings that match a beginning and end quote, the text in between is fetched using the parenthesis. See the full details of regular expressions here: docs.python.org

The following will extract all condition blocks from your response, returning them in a list of dicts. From there you can get whatever you need.

#!/usr/bin/env python

from xml.etree.ElementTree import XML
import sys
data = """<?xml version="1.0"?>
<xml_api_reply version="1">
<weather module_id="0" tab_id="0" mobile_row="0" mobile_zipped="1" row="0" section="0">
    <forecast_information>
        <city data="Baton Rouge, LA"/>
        <postal_code data="baton rouge,la"/>
        <latitude_e6 data=""/>
        <longitude_e6 data=""/>
        <forecast_date data="2011-02-22"/>
        <current_date_time data="2011-02-22 20:06:59 +0000"/>
        <unit_system data="US"/>
    </forecast_information>
    <current_conditions>
        <condition data="Cloudy"/>
        <temp_f data="72"/>
        <temp_c data="22"/>
        <humidity data="Humidity: 73%"/>
        <icon data="/ig/images/weather/cloudy.gif"/>
    </current_conditions>
</weather>
</xml_api_reply>
"""

tree = XML(data)
conditions = tree.findall("weather/current_conditions")
results = []
for c in conditions:
    curr_results = {}
    for child in c.getchildren():
        curr_results[child.tag] = child.get('data')
    results.append(curr_results)

print results

With this text (note that I added <icon data="([^"]*)"/><wind_condition data="([^"]*)"/> at the end because this part isn't in your example) in a file called 'joeljames.txt' :

<?xml version="1.0"?><xml_api_reply version="1"><weather module_id="0" tab_id="0" mobile_row="0" mobile_zipped="1" row="0" section="0" ><forecast_information><city data="Baton Rouge, LA"/><postal_code data="baton rouge,la"/><latitude_e6 data=""/><longitude_e6 data=""/><forecast_date data="2011-02-22"/><current_date_time data="2011-02-22 20:06:59 +0000"/><unit_system data="US"/></forecast_information><current_conditions><condition data="Cloudy"/><temp_f data="72"/><temp_c data="22"/><humidity data="Humidity: 73%"/><icon data="/ig/images/weather/cloudy.gif"/><wind_condition data="Wind: N at 5 mph"/>

the following short code

import re

with open('joeljames.txt','rb') as f:
    RE = ('humidity data="([^"]*)"/>'
          '<icon data="([^"]*)"/>'
          '<wind_condition data="([^"]*)"/>')
    print re.search(RE,f.read()).groups()

or even

import re
print re.search(('humidity data="([^"]*)"/>'
                 '<icon data="([^"]*)"/>'
                 '<wind_condition data="([^"]*)"/>'),
                open('joeljames.txt','rb').read()).groups()

have result:

('Humidity: 73%', '/ig/images/weather/cloudy.gif', 'Wind: N at 5 mph')

Nothing more.

I know than the priests of XML parsers will say that yoooou MUST use an XML parser because there are some that are very efficient and a coder must be lazy and etc... They are right in case what must be obtained requires complex algorithm.

But in case of a simple aim as here, I think justified not to resort to an XML parser, moreover if one doesn't know to use one. Do you ?

For my solution, well, you must know regexes, yes... It is necessary to have a minimum of tool when one wants to do something. You must indeed know a language too.....

You can use the parser solution, no problem. But now you know that it's possible with regexes too and you can choose.

EDIT:

To answer to critics that the order of elements may not be always the same:

import re
print dict(re.findall('(humidity data|icon data|wind_condition data)'
                      '="([^"]*)"/>',open('joeljames.txt','rb').read()))

prints

{'humidity data': 'Humidity: 73%', 'icon data': '/ig/images/weather/cloudy.gif', 'wind_condition data': 'Wind: N at 5 mph'}

Here's code that will extract all elements with a data element and convert them into a dictionary:

>>> from lxml import etree
>>> filePath = 'c:/test.xml'
>>> root = etree.parse(filePath)
>>> keypairs = dict((r.tag, r.get('data')) for r in root.xpath('//*[@data]'))

>>> print keypairs
{'city': 'Baton Rouge, LA', 'forecast_date': '2011-02-22', 'latitude_e6': '', 'l
ongitude_e6': '', 'temp_c': '22', 'humidity': 'Humidity: 73%', 'postal_code': 'b
aton rouge,la', 'unit_system': 'US', 'temp_f': '72', 'current_date_time': '2011-
02-22 20:06:59 +0000', 'condition': 'Cloudy', 'icon': '/ig/images/weather/cloudy
.gif'}

>>> print keypairs['humidity']
Humidity: 73%

继续阅读：python

How to fetch values in quotes using python

更多精彩内容

精彩评论

最新问答

大家觉得三星电视怎么样?？

电动幕布挂不平会不会有皱纹？

海信激光电视视距是多少,客厅大小怎么匹配?？

如何打开屏幕镜像？

检查输卵管堵了哪家医院好？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？