开发者

Efficient variable byte iteration over a string in Python

I'm reading a large (500MB) binary file in Python and parsing it byte by byte into a Python data structure. This file represents a sparse data grid. Depending on the format sometimes I need to read one byte, two bytes, or four bytes at a time. For bureaucratic reasons, I'm required to do this in Python rather than C.

I'm looking for runtime efficient mechanisms to do this in Python. Below is a simplified example of what I'm doing now:

with open(filename,'rb') as inFile:
 nCoords = struct.unpack('!i',inFile.read(4))[0]
 for i in range(nCoords):
    coord = (struct.unpack_from('!h',inFile.read(2))[0],struct.unpack_from('!h',inFile.read(2))[0]) # x, y coord
    nCrops = struct.unpack_from('!B',inFile.read(1))[0] #n crops
    for j in range(nCrops):
        cropId = struct.unpack_from('!B',inFile.read(1))[0] #cropId

I'm wondering if loading the file from disk into a string, and parsing out the string would be more e开发者_开发知识库fficient than reading a few bytes at a time. Something like:

with open(filename,'rb') as inFile:
   wholeFile = inFile.read()

But I doubt that using array splicing on wholeFile will be more efficient than what I'm already doing.

Is there a runtime efficient mechanism in Python to read a file into a string, then iterate over it a few bytes at a time? (I've checked out StringIO and it only allows reading a line at a time, not what I want in this case since the whole file is one line).


mmap

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜