开发者

Creating Regular Expressions in Python

I'm trying to create regular expression that filters from the following partial text:

amd64 build of software 1:0.98.10-0.2svn20090909 in archive

what I want to extract is:

software 1:0.98.10-0.2svn20090909

How can I do this?? I've been trying and this is what I have so far:

p = re.compile('([a-zA-Z0-9\-\+\.]+)\ ([0-9\:\.\-]+)')
iterator = p.findit开发者_高级运维er("amd64 build of software 1:0.98.10-0.2svn20090909 in archive")
for match in iterator:
    print match.group()

with result:

software 1:0.98.10-0.2

(svn20090909 is missing)

Thanks a lot.


This will work:

p = re.compile(r'([a-zA-Z0-9\-\+\.]+)\ ([0-9][0-9a-zA-Z\:\.\-]+)')
iterator = p.finditer("amd64 build of dvdrip software 1:0.98.10-0.2svn20090909 in archive")
for match in iterator:
    print match.group()
# Prints: software 1:0.98.10-0.2svn20090909

That works by allowing the captured section to contain letters while still insisting that it starts with a number.

Without seeing all the other strings it needs to match, I can't be sure whether that's good enough.


If you have consistent lines, this is, if each entry is on one line and the first word you want is always before the numbers part (the 1:0.98 ... part) you don't need a regexp. Try this:

>>> s = 'amd64 build of software 1:0.98.10-0.2svn20090909 in archive'
>>> match = [s.split()[3], s.split()[4]]
>>> print match
['software', '1:0.98.10-0.2svn20090909']
>>> # alternatively
>>> match = s.split()[3:5] # for same result

what this is doing is the following: it first splits the line s at the spaces (using the string method split()) and selects the fourth and fifth elements of the resulting list; both are stored in the variable match.

Again , this only works if you have one entry per line and if the 'software' part always comes before the 1:0.98.10-0.2svn20090909 part.

I often avoid regexps when I can do with split lists. If the parsing becomes a nightmare, I use pyparsing.


Don't use a capturing group if you want everything in one piece.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜