开发者

Python Regular Expression

I'd like to extract the designator and ops from the string designator: op1 op2, in which 开发者_运维知识库there could be 0 or more ops and multiple spaces are allowed. I used the following regular expression in Python

import re
match = re.match(r"^(\w+):(\s+(\w+))*", "des1: op1   op2")

The problems is that only des1 and op2 is found in the matching groups, op1 is not. Does anyone know why?

The groups from above code is
Group 0: des1: op1 op2
Group 1: des1
Group 2:  op2
Group 3: op2


both are 'found', but only one can be 'captured' by the group. if you need to capture more than one group, then you need to use the regular expression functionality multiple times. You could do something like this, first by rewriting the main expression:

match = re.match(r"^(\w+):(.*)", "des1: op1   op2")

then you need to extract the individual subsections:

ops = re.split(r"\s+", match.groups()[1])[1:]


I don't really see why you'd need regex, it's quite simple to parse with string methods:

>>> des, _, ops = 'des1: op1   op2'.partition(':')
>>> ops
' op1   op2'
>>> ops.split()
['op1', 'op2']


I'd do sth like this:

>>> import re
>>> tokenize = re.compile(flags=re.VERBOSE, pattern="""
...     (?P<de> \w+ (?=:) ) |
...     (?P<op> \w+)
... """).finditer
... 
>>> 
>>> for each in tokenize("des1: op1   op2"):
...     print each.lastgroup, ':', each.group()
...
de : des1
op : op1
op : op2
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜