Efficient variable byte iteration over a string in Python
I'm reading a large (500MB) binary file in Python and parsing it byte by byte into a Python data structure. This file represents a sparse data grid. Depending on the format sometimes I need to read one byte, two bytes, or four bytes at a time. For bureaucratic reasons, I'm required to do this in Python rather than C.
I'm looking for runtime efficient mechanisms to do this in Python. Below is a simplified example of what I'm doing now:
with open(filename,'rb') as inFile:
nCoords = struct.unpack('!i',inFile.read(4))[0]
for i in range(nCoords):
coord = (struct.unpack_from('!h',inFile.read(2))[0],struct.unpack_from('!h',inFile.read(2))[0]) # x, y coord
nCrops = struct.unpack_from('!B',inFile.read(1))[0] #n crops
for j in range(nCrops):
cropId = struct.unpack_from('!B',inFile.read(1))[0] #cropId
I'm wondering if loading the file from disk into a string, and parsing out the string would be more e开发者_开发知识库fficient than reading a few bytes at a time. Something like:
with open(filename,'rb') as inFile:
wholeFile = inFile.read()
But I doubt that using array splicing on wholeFile
will be more efficient than what I'm already doing.
Is there a runtime efficient mechanism in Python to read a file into a string, then iterate over it a few bytes at a time? (I've checked out StringIO
and it only allows reading a line at a time, not what I want in this case since the whole file is one line).
mmap
精彩评论