Python binary file reading problem
I'm trying to read a binary file (which represents a matrix in Matlab) in Python. But I am having trouble reading the file and converting the bytes to the correct values.
The binary file consists of a sequence of 4-byte numbers. The first two numbers are the number of rows and columns res开发者_开发知识库pectively. My friend gave me a Matlab function he wrote that does this using fwrite. I would like to do something like this:
f = open(filename, 'rb')
rows = f.read(4)
cols = f.read(4)
m = [[0 for c in cols] for r in rows]
r = c = 0
while True:
if c == cols:
r += 1
c = 0
num = f.read(4)
if num:
m[r][c] = num
c += 1
else:
break
But whenever I use f.read(4), I get something like '\x00\x00\x00\x04' (this specific example should represent a 4), and I can't figure out convert it into the correct number (using int, hex or anything like that doesn't work). I stumbled upon struct.unpack, but that didn't seem to help very much.
Here is an example matrix and the corresponding binary file (as it appears when I read the entire file using the python function f.read() without any size paramater) that the Matlab function created for it:
4 4 2 4
2 2 2 1
3 3 2 4
2 2 6 2
'\x00\x00\x00\x04\x00\x00\x00\x04@\x80\x00\x00@\x00\x00\x00@@\x00\x00@\x00\x00\x00@\x80\x00\x00@\x00\x00\x00@@\x00\x00@\x00\x00\x00@\x00\x00\x00@\x00\x00\x00@\x00\x00\x00@\xc0\x00\x00@\x80\x00\x00?\x80\x00\x00@\x80\x00\x00@\x00\x00\x00'
So the first 4 bytes and the 5th-8th bytes should both be 4, as the matrix is 4x4. and then it should be 4,4,2,4,2,2,2,1,etc...
Thanks guys!
rows = f.read(4)
cols = f.read(4)
both names are now bound to 4-byte strings. To turn them into integers instead,
import struct
rowsandcols = f.read(8)
rows, cols = struct.unpack('=ii', rowsandcols)
See the docs for struct.unpack
.
I looked a bit more in your problem, since I had never used struct
before so it was good learning activity. Turns out there are couple of twists there - first the numbers are not stored as 4-byte integers but as 4-byte float in big-endian form. Second, if your example is correct, then the matrix was not stored as one would expect - by rows, but by columns instead. E.g. it was output like so (pseudocode):
for j in cols:
for i in rows:
write Aij to file
So I had to transpose the result after reading. Here is the code that you need given the example:
import struct
def readMatrix(f):
rows, cols = struct.unpack('>ii',f.read(8))
m = [ list(struct.unpack('>%df' % rows, f.read(4*rows)))
for c in range(cols)
]
# transpose result to return
return zip(*m)
And here we test it:
>>> from StringIO import StringIO
>>> f = StringIO('\x00\x00\x00\x04\x00\x00\x00\x04@\x80\x00\x00@\x00\x00\x00@@\x00\x00@\x00\x00\x00@\x80\x00\x00@\x00\x00\x00@@\x00\x00@\x00\x00\x00@\x00\x00\x00@\x00\x00\x00@\x00\x00\x00@\xc0\x00\x00@\x80\x00\x00?\x80\x00\x00@\x80\x00\x00@\x00\x00\x00')
>>> mat = readMatrix(f)
>>> for row in mat:
... print row
...
(4.0, 4.0, 2.0, 4.0)
(2.0, 2.0, 2.0, 1.0)
(3.0, 3.0, 2.0, 4.0)
(2.0, 2.0, 6.0, 2.0)
精彩评论