Regex: match sequence of not a certain string [duplicate]
s = 'blah blah blah... _ABC_superman_is_cool_CBA_ ...blah blah blah...'
This is just an example, but I want to match everything between _ABC_ and _CBA_. So 'superman_is_cool'. There may be m开发者_高级运维ultiple sections of _ABC_..._CBA_.
re.findall('_ABC_(.*)(?=_CBA_)', s)
I tried this first, but obviously doesn't correctly work at all.
I added an additional _ABC_
, _CBA_
pair to make sure it finds all the matches:
>>> s = 'blah blah blah... _ABC_superman_is_cool_CBA_ ...blah blah _ABC_blah_CBA_...'
>>> re.findall('_ABC_(.*?)_CBA_', s)
['superman_is_cool', 'blah']
The ?
makes the *
operator non-greedy so it finds as short a match as possible. Without it the result would be ['superman_is_cool_CBA_ ...blah blah _ABC_blah']
.
Try this
re.findall('_ABC_.*_CBA_)', s)
精彩评论