RegEx How to find text between two strings
I have this text
XXX
text
XXX
XXX
text
XXX
XXX
text
XXX
and i want to capture the text between the XXX and XXX. (i am trying to get chapters out from a book )
/XXX.*XXX/
This will capture the first begin and the last end
/XXX.*?XXX/
This will skip every second chapter
Thanks ah开发者_如何转开发ead Barak
If you text contains line feeds (\n
) you'll need to add the "dot matched newline" switch to your regex, as well as making your match "non greedy":
/(?s)XXX.*?XXX/
Edited: Thanks to Alan's comment - I had the wrong switch: (?s)
is correct
Solution using sed
$ sed -n '/XXX/,/XXX/{n;p}' text
text
text
text
If this XXX
strings are always in separate lines, i would suggest simple iterating through lines and picking it 'by hand'. It should be faster than multi-line regexp.
python :
delim = "XXX"
inside = False
lines = []
idx = 0
for line in file:
if line.strip() == delim:
inside = not inside
if inside: lines.append([])
else: idx += 1
elif inside:
lines[idx].append(line)
Your description doesn't really match your examples. If XXX
is supposed to represent a chapter heading, there would only be one at the beginning of each chapter. To detect the end of a chapter, you would need to do a lookahead for the next chapter heading:
/XXX.*?(?=XXX)/s
That should work for all but the last chapter; to match that you can use \z
, the end anchor:
/XXX.*?(?=XXX|\z)/s
It really would help if we knew which regex flavor you're using. For example, in Ruby you would have to use /m
instead of /s
to allow .
to match linefeeds.
精彩评论