开发者

How to regex in python?

I am trying to parse the keywords from google suggest, this is the url:

http://google.com/complete/search开发者_Go百科?output=toolbar&q=test

I've done it with php using:

'|<CompleteSuggestion><suggestion data="(.*?)"/><num_queries int="(.*?)"/></CompleteSuggestion>|is'

But that wont work with python re.match(pattern, string), I tried a few but some show error and some return None.

How can I parse that info? I dont want to use minidom because I think regex will be less code.


You could use etree:

>>> from xml.etree.ElementTree import XMLParser
>>> x = XMLParser()
>>> x.feed('<toplevel><CompleteSuggestion><suggestion data=...')
>>> tree = x.close()
>>> [(e.find('suggestion').get('data'), int(e.find('num_queries').get('int')))
     for e in tree.findall('CompleteSuggestion')]
[('test internet speed', 31800000), ('test', 686000000), ...]

It is more code than a regex, but it also does more. Specifically, it will fetch the entire list of matches in one go, and unescape any weird stuff like double-quotes in the data attribute. It also won't get confused if additional elements start appearing in the XML.


RegEx match open tags except XHTML self-contained tags

This is an XML document. Please, reconsider an XML parser. It will be more robust and probably take you less time in the end, even if it is more code.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜