Having trouble parsing HTML

2023-02-16 11:32 问答作者：

I looked around and found a few examples of how to split text in python but having problems on my example. Here's what I want to parse:

<img alt="" src="http://example.com/servlet/charting?base_color=grey&a开发者_如何学Cmp;chart_width=288&amp;chart_height=160&amp;chart_type=png&amp;chart_style=manufund_pie&amp;3DSet=true&amp;chart_size=small&amp;leg_on=left&amp;static_xvalues=10.21,12.12,43.12,12.10,&amp;static_labels=blue,red,green,purple">

Here's what I tried:

dict(kvpair.split('=') for kvpair in variableIwantToParse.split('&'))

I get the error "ValueError: dictionary update sequence element #0 has length 5; 2 is required"

I tried also to use variableIwantToParse.strip('&') but when I tried to print variableIwantToParse it only displaced one letter at a time.

I'm sure this is easy but can't seem to figure out how to parse it. I basically want 10.21,12.12,43.12,12.10 to be associated with blue,red,green,purple (in the order displayed)

Thanks very much for your help(and sorry if this is too easy..I just can't for the life of me figure out the command to parse this) :-)

Use the built-in urlparse module, do not do these splits yourself.

>>> import urlparse
>>> url_to_parse = "http://example.com/servlet/charting?base_color=grey&amp;chart_width=288&amp;chart_height=160&amp;chart_type=png&amp;chart_style=manufund_pie&amp;3DSet=true&amp;chart_size=small&amp;leg_on=left&amp;static_xvalues=10.21,12.12,43.12,12.10,&amp;static_labels=blue,red,green,purple"
>>> parsed_url = urlparse.urlparse(url_to_parse)
>>> query_as_dict = urlparse.parse_qs(parsed_url.query)
>>> print query_as_dict
{'chart_size': ['small'], 'base_color': ['grey'], 'chart_style': ['manufund_pie'], 'chart_height': ['160'], 'static_xvalues': ['10.21,12.12,43.12,12.10,'], 'chart_width': ['288'], 'static_labels': ['blue,red,green,purple'], 'leg_on': ['left'], 'chart_type': ['png'], '3DSet': ['true']}

If you're using Python with a version less than 2.6, then you have to import the cgi module. Do this instead:

>>> import urlparse
>>> import cgi
>>> parsed_url = urlparse.urlparse(url_to_parse)
>>> query_as_dict = cgi.parse_qs(parsed_url.query)
>>> print query_as_dict
{'chart_size': ['small'], 'base_color': ['grey'], 'chart_style': ['manufund_pie'], 'chart_height': ['160'], 'static_xvalues': ['10.21,12.12,43.12,12.10,'], 'chart_width': ['288'], 'static_labels': ['blue,red,green,purple'], 'leg_on': ['left'], 'chart_type': ['png'], '3DSet': ['true']}

Then to associate them to a dictionary, use the provided dict constructor alongside zip.

>>> print dict(zip( query_as_dict['static_labels'][0].split(','), query_as_dict['static_xvalues'][0].split(',')))
{'blue': '10.21', 'purple': '12.10', 'green': '43.12', 'red': '12.12'}

This will get you what you want:

d = dict(kv.split('=') for kv in string_to_parse.split('?')[1][:-2].split('&amp;'))
labels_and_values = zip(d['static_labels'].split(','), d['static_xvalues'].split(','))

It can be really useful to break down things in the command prompt when you run into trouble. For example:

10 > for kv in s.split('&'):
...:     print kv.split('=')

If you check it out you'll see splitting on & was causing you issues (feeding dict too many values for one item in the list).

square brackets:

dict([kvpair.split('=') for kvpair in variableIwantToParse.split('&')])

also, replacing & with & could help.

继续阅读：python

Having trouble parsing HTML

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？