开发者

find optional middle of string surrounded by lazy, regex

I'm using python and regex to try to extract the optional middle of a string.

>>> re.search(r'(.*?)(HELLO|BYE)?(.*?END)', r'qweHELLOsdfsEND').groups()
('', None, 'qweHELLOsdfsEND') #what I want is ('qwe', 'HELLO', 'sdfsEND')
>>> re.search(r'(.*?)(HELLO|BYE)?(.*?开发者_运维技巧END)', r'qweBLAHsdfsEND').groups()
('', None, 'qweBLAHsdfsEND') #when the middle doesn't match. this is OK

How can I extract the optional middle?

Note: This is my first post.


Your regex fails because the first part is happy with matching the empty string, the second part fails (which is OK since it's optional), so the third part captures all. Solution: Make the first part match anything up to HELLO or END:

>>> re.search(r'((?:(?!HELLO|BYE).)*)(HELLO|BYE)?(.*?END)', r'qweHELLOsdfsEND').groups()
('qwe', 'HELLO', 'sdfsEND')
>>> re.search(r'((?:(?!HELLO|BYE).)*)(HELLO|BYE)?(.*?END)', r'qweBLAHsdfsEND').groups()
('qweBLAHsdfs', None, 'END')

Is that acceptable?

Explanation:

(?:         # Try to match the following:
 (?!        # First assert that it's impossible to match
  HELLO|BYE # HELLO or BYE
 )          # at this point in the string.
 .          # If so, match any character.
)*          # Do this any number of times.


You can do it like this:

try:
    re.search(r'(.*?)(HELLO|BYE)(.*?END)', r'qweHELLOsdfsEND').groups()
except AttributeError:
    print 'no match'
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜