match a paragraph starts with some letter
I have a file that contains paragraphs starting with AB, I wanted to get all these paragraphs, I used the following code, but it returns nothing:
import 开发者_开发问答re
paragraphs = re.findall(r'AB[.\n]+AD',text) #AD is the beginning of the next paragraph
Any idea why did not this work?
Thanks
Try:
re.findall(r'AB.+?(?=AD)', text, re.DOTALL)
The re.DOTALL
flag will let the dot cover everything included the newlines. And (?=AD)
will match everything up to the last character before AD
, but will not include AD
into the matched string.
You can then rstrip()
the resulting strings to remove all newlines from the end.
from the python re
module documentation:
[]
Used to indicate a set of characters. Characters can be listed individually,
or a range of characters can be indicated by giving two characters and
separating them by a '-'. Special characters are not active inside sets.
this means that .
inside the brackets matches a dot, and not any character as it would anywhere else in a regexp.
精彩评论