开发者

Having trouble parsing HTML

I looked around and found a few examples of how to split text in python but having problems on my example. Here's what I want to parse:

<img alt="" src="http://example.com/servlet/charting?base_color=grey&a开发者_如何学Cmp;chart_width=288&amp;chart_height=160&amp;chart_type=png&amp;chart_style=manufund_pie&amp;3DSet=true&amp;chart_size=small&amp;leg_on=left&amp;static_xvalues=10.21,12.12,43.12,12.10,&amp;static_labels=blue,red,green,purple">

Here's what I tried:

dict(kvpair.split('=') for kvpair in variableIwantToParse.split('&'))

I get the error "ValueError: dictionary update sequence element #0 has length 5; 2 is required"

I tried also to use variableIwantToParse.strip('&') but when I tried to print variableIwantToParse it only displaced one letter at a time.

I'm sure this is easy but can't seem to figure out how to parse it. I basically want 10.21,12.12,43.12,12.10 to be associated with blue,red,green,purple (in the order displayed)

Thanks very much for your help(and sorry if this is too easy..I just can't for the life of me figure out the command to parse this) :-)


Use the built-in urlparse module, do not do these splits yourself.

>>> import urlparse
>>> url_to_parse = "http://example.com/servlet/charting?base_color=grey&amp;chart_width=288&amp;chart_height=160&amp;chart_type=png&amp;chart_style=manufund_pie&amp;3DSet=true&amp;chart_size=small&amp;leg_on=left&amp;static_xvalues=10.21,12.12,43.12,12.10,&amp;static_labels=blue,red,green,purple"
>>> parsed_url = urlparse.urlparse(url_to_parse)
>>> query_as_dict = urlparse.parse_qs(parsed_url.query)
>>> print query_as_dict
{'chart_size': ['small'], 'base_color': ['grey'], 'chart_style': ['manufund_pie'], 'chart_height': ['160'], 'static_xvalues': ['10.21,12.12,43.12,12.10,'], 'chart_width': ['288'], 'static_labels': ['blue,red,green,purple'], 'leg_on': ['left'], 'chart_type': ['png'], '3DSet': ['true']}

If you're using Python with a version less than 2.6, then you have to import the cgi module. Do this instead:

>>> import urlparse
>>> import cgi
>>> parsed_url = urlparse.urlparse(url_to_parse)
>>> query_as_dict = cgi.parse_qs(parsed_url.query)
>>> print query_as_dict
{'chart_size': ['small'], 'base_color': ['grey'], 'chart_style': ['manufund_pie'], 'chart_height': ['160'], 'static_xvalues': ['10.21,12.12,43.12,12.10,'], 'chart_width': ['288'], 'static_labels': ['blue,red,green,purple'], 'leg_on': ['left'], 'chart_type': ['png'], '3DSet': ['true']}

Then to associate them to a dictionary, use the provided dict constructor alongside zip.

>>> print dict(zip( query_as_dict['static_labels'][0].split(','), query_as_dict['static_xvalues'][0].split(',')))
{'blue': '10.21', 'purple': '12.10', 'green': '43.12', 'red': '12.12'}


This will get you what you want:

d = dict(kv.split('=') for kv in string_to_parse.split('?')[1][:-2].split('&amp;'))
labels_and_values = zip(d['static_labels'].split(','), d['static_xvalues'].split(','))

It can be really useful to break down things in the command prompt when you run into trouble. For example:

10 > for kv in s.split('&'):
...:     print kv.split('=')

If you check it out you'll see splitting on & was causing you issues (feeding dict too many values for one item in the list).


square brackets:

dict([kvpair.split('=') for kvpair in variableIwantToParse.split('&')])

also, replacing & with &amp; could help.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜