开发者

How to grab a chunk of data from a file?

I want to grab a chunk of data from a file. I know the start line and the end line. I wrote the code but its incomplete and I don't know how to solve it further.

file = open(filename,'r')
    end_line='### Leave a comment!'
star_line = 'Kill the master'
    for line in file:
  开发者_如何学Python          if star_line in line:   
        ??


startmarker = "ohai"
endmarker = "meheer?"
marking = False
result = []

with open("somefile") as f:
  for line in f:
    if line.startswith(startmarker): marking = True
    elif line.startswith(endmarker): marking = False

    if marking: result.append(line)

if len(result) > 1:
  print "".join(result[1:])

Explanation: The with block is a nice way to use files -- it makes sure you don't forget to close() it later. The for walks each line and:

  • starts outputting when it sees a line that starts with 'ohai' (including that line)
  • stops outputting when it sees a line that starts with 'meheer?' (without outputting that line).

After the loop, result contains the part of the file that is needed, plus that initial marker. Rather than making the loop more complicated to ignore the marker, I just throw it out using a slice: result[1:] returns all elements in result starting at index 1; in other words, it excludes the first element (index 0).

Update to reflect add partial-line matches:

startmarker = "ohai"
endmarker = "meheer?"
marking = False
result = []

with open("somefile") as f:
  for line in f:
    if not marking:
      index = line.find(startmarker)
      if index != -1:
        marking = True
        result.append(line[index:])
    else:
      index = line.rfind(endmarker)
      if index != -1:
        marking = False
        result.append(line[:index + len(endmarker)])
      else:
        result.append(line)

print "".join(result)

Yet more explanation: marking still tells us whether we should be outputting whole lines, but I've changed the if statements for the start and end markers as follows:

  • if we're not (yet) marking, and we see the startmarker, then output the current line starting at the marker. The find method returns the position of the first occurrence of startmarker in this case. The line[index:] notation means 'the content of line starting at position index.

  • while marking, just output the current line entirely unless it contains endmarker. Here, we use rfind to find the rightmost occurrence of endmarker, and the line[...] notation means 'the content of line up to position index (the start of the match) plus the marker itself.' Also: stop marking now :)


if reading the whole file is not a problem, I would use file.readlines() to read in all the lines in a list of strings.

then you can use list_of_lines.index(value) to find the indices of the first and last line, and then select all the lines between these two indices.


First, a test file (assuming Bash shell):

for i in {0..100}; do  echo "line $i"; done > test_file.txt

That generates a file a 101 line file with lines line 0\nline 1\n ... line 100\n

This Python script captures the line between and including mark1 up to and not including mark2:

#!/usr/bin/env python

mark1 = "line 22"
mark2 = "line 26"
record=False
error=False
buf = []

with open("test_file.txt") as f:
  for line in f:
    if mark1==line.rstrip(): 
        if error==False and record==False: 
            record=True

    if mark2==line.rstrip(): 
        if record==False:
            error=True
        else:
            record=False

    if record==True and error==False: 
        buf.append(line)

if len(buf) > 1 and error==False:
    print "".join(buf)
else:
    print "There was an error in there..."

Prints:

line 22
line 23
line 24
line 25

in this case. If both marks are not found in the correct sequence, it will print an error.

If the size of the file between the marks is excessive, you may need some additional logic. You can also use a regex for each line instead of an exact match if that fits your use case.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜