开发者

Two very close regexes with lookahead assertions in Python - why does re.split() behave differently?

I was trying to anser this question where the OP has the following string:

"path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"

and wants to split it to obtain the following list:

['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']

I tried to solve it by using a simple lookahead assertion in a regex, (?=path:). Well, it did not work:

>>> s = "path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"
>>> r = re.compile('(?=path:)')
>>> r.split(s)
['path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism']

However, in this answer, the answerer got it wor开发者_Python百科king by preceding the lookahead assertion with a whitespace:

>>> line = 'path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism'
>>> re.split(' (?=path:)', line)
['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']

Why did the regex work with the whitespace? Why did it not work without the whitespace?


Python's re.split() has a documented limitation: It can't split on zero-length matches. Therefore the split only worked with the added space.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜