How to read lines from a file in python starting from the end
I need to know how to read lines from a file in python so that I read the last line first and continue in that fashion until the cursor reach's the beginning of the file. Any idea's开发者_JAVA百科?
The general approach to this problem, reading a text file in reverse, line-wise, can be solved by at least three methods.
The general problem is that since each line can have a different length, you can't know beforehand where each line starts in the file, nor how many of them there are. This means you need to apply some logic to the problem.
General approach #1: Read the entire file into memory
With this approach, you simply read the entire file into memory, in some data structure that subsequently allows you to process the list of lines in reverse. A stack, a doubly linked list, or even an array can do this.
Pros: Really easy to implement (probably built into Python for all I know)
Cons: Uses a lot of memory, can take a while to read large files
General approach #2: Read the entire file, store position of lines
With this approach, you also read through the entire file once, but instead of storing the entire file (all the text) in memory, you only store the binary positions inside the file where each line started. You can store these positions in a similar data structure as the one storing the lines in the first approach.
Whever you want to read line X, you have to re-read the line from the file, starting at the position you stored for the start of that line.
Pros: Almost as easy to implement as the first approach
Cons: can take a while to read large files
General approach #3: Read the file in reverse, and "figure it out"
With this approach you will read the file block-wise or similar, from the end, and see where the ends are. You basically have a buffer, of say, 4096 bytes, and process the last line of that buffer. When your processing, which has to move one line at a time backward in that buffer, comes to the start of the buffer, you need to read another buffer worth of data, from the area before the first buffer you read, and continue processing.
This approach is generally more complicated, because you need to handle such things as lines being broken over two buffers, and long lines could even cover more than two buffers.
It is, however, the one that would require the least amount of memory, and for really large files, it might also be worth doing this to avoid reading through gigabytes of information first.
Pros: Uses little memory, does not require you to read the entire file first
Cons: Much hard to implement and get right for all corner cases
There are numerous links on the net that shows how to do the third approach:
- ActiveState Recipe 120686 - Read a text file backwards
- ActiveState Recipe 439045 - Read a text file backwards (yet another implementation)
- Top4Download.com Script - Read a text file backwards
Recipe 120686: Read a text file backwards (Python)
You can also use python module file_read_backwards. It would be read in a memory efficient manner. It works with Python 2.7 and 3.
It supports "utf-8","latin-1", and "ascii" encoding. It will work with "\r", "\n", and "\r\n" as new lines.
After installing it, via pip install file_read_backwards
(v1.2.1), you can read the entire file backwards (line-wise) via:
#!/usr/bin/env python2.7
from file_read_backwards import FileReadBackwards
with FileReadBackwards("/path/to/file", encoding="utf-8") as frb:
for l in frb:
print l
Further documentation can be found at http://file-read-backwards.readthedocs.io/en/latest/readme.html
A straightforward way is to first create a temporary reversed file, then reversing each line in this file.
import os, tempfile
def reverse_file(in_filename, fout, blocksize=1024):
filesize = os.path.getsize(in_filename)
fin = open(in_filename, 'rb')
for i in range(filesize // blocksize, -1, -1):
fin.seek(i * blocksize)
data = fin.read(blocksize)
fout.write(data[::-1])
def enumerate_reverse_lines(in_filename, blocksize=1024):
fout = tempfile.TemporaryFile()
reverse_file(in_filename, fout, blocksize=blocksize)
fout.seek(0)
for line in fout:
yield line[::-1]
The above code will yield lines with newlines at the beginning instead of the end, and there is no attempt to handle DOS/Windows-style newlines (\r\n).
This solution is simpler than any others I've seen.
def xreadlines_reverse(f, blksz=524288):
"Act as a generator to return the lines in file f in reverse order."
buf = ""
f.seek(0, 2)
pos = f.tell()
lastn = 0
if pos == 0:
pos = -1
while pos != -1:
nlpos = buf.rfind("\n", 0, -1)
if nlpos != -1:
line = buf[nlpos + 1:]
if line[-1] != "\n":
line += "\n"
buf = buf[:nlpos + 1]
yield line
elif pos == 0:
pos = -1
yield buf
else:
n = min(blksz, pos)
f.seek(-(n + lastn), 1)
rdbuf = f.read(n)
lastn = len(rdbuf)
buf = rdbuf + buf
pos -= n
Example usage:
for line in xreadlines_reverse(open("whatever.txt")):
do_stuff(line)
精彩评论