How to grab a chunk of data from a file?
I want to grab a chunk of data from a file. I know the start line and the end line. I wrote the code but its incomplete and I don't know how to solve it further.
file = open(filename,'r')
end_line='### Leave a comment!'
star_line = 'Kill the master'
for line in file:
开发者_如何学Python if star_line in line:
??
startmarker = "ohai"
endmarker = "meheer?"
marking = False
result = []
with open("somefile") as f:
for line in f:
if line.startswith(startmarker): marking = True
elif line.startswith(endmarker): marking = False
if marking: result.append(line)
if len(result) > 1:
print "".join(result[1:])
Explanation: The with
block is a nice way to use files -- it makes sure you don't forget to close()
it later. The for
walks each line and:
- starts outputting when it sees a line that starts with
'ohai'
(including that line) - stops outputting when it sees a line that starts with
'meheer?'
(without outputting that line).
After the loop, result
contains the part of the file that is needed, plus that initial marker. Rather than making the loop more complicated to ignore the marker, I just throw it out using a slice: result[1:]
returns all elements in result
starting at index 1; in other words, it excludes the first element (index 0).
Update to reflect add partial-line matches:
startmarker = "ohai"
endmarker = "meheer?"
marking = False
result = []
with open("somefile") as f:
for line in f:
if not marking:
index = line.find(startmarker)
if index != -1:
marking = True
result.append(line[index:])
else:
index = line.rfind(endmarker)
if index != -1:
marking = False
result.append(line[:index + len(endmarker)])
else:
result.append(line)
print "".join(result)
Yet more explanation: marking
still tells us whether we should be outputting whole lines, but I've changed the if
statements for the start and end markers as follows:
if we're not (yet) marking, and we see the
startmarker
, then output the current line starting at the marker. Thefind
method returns the position of the first occurrence ofstartmarker
in this case. Theline[index:]
notation means 'the content ofline
starting at positionindex
.while marking, just output the current line entirely unless it contains
endmarker
. Here, we userfind
to find the rightmost occurrence ofendmarker
, and theline[...]
notation means 'the content ofline
up to positionindex
(the start of the match) plus the marker itself.' Also: stop marking now :)
if reading the whole file is not a problem, I would use file.readlines()
to read in all the lines in a list of strings.
then you can use list_of_lines.index(value)
to find the indices of the first and last line, and then select all the lines between these two indices.
First, a test file (assuming Bash shell):
for i in {0..100}; do echo "line $i"; done > test_file.txt
That generates a file a 101 line file with lines line 0\nline 1\n
... line 100\n
This Python script captures the line between and including mark1
up to and not including mark2
:
#!/usr/bin/env python
mark1 = "line 22"
mark2 = "line 26"
record=False
error=False
buf = []
with open("test_file.txt") as f:
for line in f:
if mark1==line.rstrip():
if error==False and record==False:
record=True
if mark2==line.rstrip():
if record==False:
error=True
else:
record=False
if record==True and error==False:
buf.append(line)
if len(buf) > 1 and error==False:
print "".join(buf)
else:
print "There was an error in there..."
Prints:
line 22
line 23
line 24
line 25
in this case. If both marks are not found in the correct sequence, it will print an error.
If the size of the file between the marks is excessive, you may need some additional logic. You can also use a regex for each line instead of an exact match if that fits your use case.
精彩评论