read a very very big file with python
What is the best solution to process each line of a text file whose size is abo开发者_如何学Gout 500 MB?
The proposal to which I had thought :
def files(mon_fichier):
while True:
data = mon_fichier.read(1024)
if not data:
break
yield data
fichier = open('tonfichier.txt', 'r')
for bloc in files(fichier):
print bloc
Thank you in advance
with open('myfile.txt') as inf:
for line in inf:
# do something
pass
Just using the standard file operations should work as long as you keep away from readlines
and instead just use readline
.
The answer is depending what you want to do with the datas... I recommend to read by block and treat each block just after reading like :
fs = open(source, 'r')
while 1:
txt = fs.readline(1000)
< your treatement>
if txt =="":
break
fs.close()
As far as I understand the processes, the reading of a file goes through a buffer.
In this condition, mon_fichier.read(1024)
don't fetch 1024 bytes directly from the file but from the buffer until this one will be exhausted, and then the buffer will be filled again by a new real reading of, say, 4096 or 8192 or 16384 or... bytes, I don't know precisely (think it's a power of 2, but even not sure)
Then, if you really want to treat blocks of bytes , I think that philnext's code is preferable. But readline(1000)
must be replaced with read(1000)
if you want to fetch exactly 1000 bytes; readline(1000)
returns a line, and no more, even if the line is 4 characters long.
Treating a file by blocks may be what you really want to do , but it seems uncommon to me. It is more frequent to treat a file by lines, and in this case it's the Hugh Bothwell's code that is the right manner.
精彩评论