Filter text which appears between two marks
Part 1
What is the easiest way to create a text filter which outputs only text surrounded by two predefined marks. I don't mind using any standard tool: sed, awk, python, ...
For example, i would like only the text surrounded by "Mark Begin" and "Mark End" to appear.
input:
Text 1
Mark Begin
Text 2
Mark End
Text 3
Mark Begin
Text 4
MarK End
Text 4
output:
Text 2
Text 4
Part 2开发者_JAVA百科
How can the solution be modified so that only the last occurrence will be written to output, so for the same input above, we get:
output:
Text 4
$ awk '/Mark End/{f=0}/Mark Begin/{f=1;next}f' file
Text 2
Text 4
$ awk '/Mark End/{f=0}/Mark Begin/{f=1;next}f{p=$0}END{print p}' file
Text 4
part 1
awk '
tolower($0) ~ /mark begin/ {printing = 1; next}
tolower($0) ~ /mark end/ {printing = 0; next}
printing {print}
'
part 2
awk '
tolower($0) ~ /mark begin/ {capturing = 1; text = ""; next}
tolower($0) ~ /mark end/ {capturing = 0; sep = ""; next}
capturing {text = text sep $0; sep = "\n"}
END {print text}
'
I found a good solution:
awk '/Mark End/, /Mark Begin/' file.lst
for second case, but it will require mark filtering after all.
A functional (state-less) implementation using Python and lazy generators:
import itertools
def get_lines_between_marks(ilines, start_mark, end_mark):
for line in ilines:
if line.strip().lower() == start_mark:
yield list(itertools.takewhile(lambda s: s.strip().lower() != end_mark, ilines))
for group in get_lines_between_marks(open("file.txt"), "mark begin", "mark end"):
for line in group:
print line,
# Text 2
# Text 4
And now your second request is trivial (see iterlast here):
def iterlast(it):
return reduce(lambda x, y: y, it)
for line in iterlast(get_lines_between_marks(open("file.txt"), "mark begin", "mark end")):
print line,
# Text 4
To output each:
sed -n '/^Mark Begin$/{:a;n;/^Mark End$/b;p;ba}' inputfile
To output the last
sed -n '${x;s/\n//;p};/^Mark Begin$/{x;s/.*//;x;:a;n;/^Mark End$/b;H;ba}'
精彩评论