find optional middle of string surrounded by lazy, regex
I'm using python and regex to try to extract the optional middle of a string.
>>> re.search(r'(.*?)(HELLO|BYE)?(.*?END)', r'qweHELLOsdfsEND').groups()
('', None, 'qweHELLOsdfsEND') #what I want is ('qwe', 'HELLO', 'sdfsEND')
>>> re.search(r'(.*?)(HELLO|BYE)?(.*?开发者_运维技巧END)', r'qweBLAHsdfsEND').groups()
('', None, 'qweBLAHsdfsEND') #when the middle doesn't match. this is OK
How can I extract the optional middle?
Note: This is my first post.
Your regex fails because the first part is happy with matching the empty string, the second part fails (which is OK since it's optional), so the third part captures all. Solution: Make the first part match anything up to HELLO
or END
:
>>> re.search(r'((?:(?!HELLO|BYE).)*)(HELLO|BYE)?(.*?END)', r'qweHELLOsdfsEND').groups()
('qwe', 'HELLO', 'sdfsEND')
>>> re.search(r'((?:(?!HELLO|BYE).)*)(HELLO|BYE)?(.*?END)', r'qweBLAHsdfsEND').groups()
('qweBLAHsdfs', None, 'END')
Is that acceptable?
Explanation:
(?: # Try to match the following:
(?! # First assert that it's impossible to match
HELLO|BYE # HELLO or BYE
) # at this point in the string.
. # If so, match any character.
)* # Do this any number of times.
You can do it like this:
try:
re.search(r'(.*?)(HELLO|BYE)(.*?END)', r'qweHELLOsdfsEND').groups()
except AttributeError:
print 'no match'
精彩评论