Efficient writing to a Compact Flash in python
I'm writing a gui to do perform a glorified 'dd'.
I could just subprocess to 'dd' but I thought I might as well use python's open()
/read()
/write()
if I can as it'll let me display progress much more easily.
Prompted by this link here I have:
input = open('filename.img', 'rb')
output = open("/dev/sdc", 'wb')
while True:
buffer = input.read(1024)
if buffer:
output.write(buffer)
else:
break
input.close()
output.close()
...however it is horribly slow. Or at least far slower than dd
. (around 4-5x slower)
I had a play and noticed altering the number of bytes 'buffered' had a huge affect on the speed of completion. Raising it to 2048 for example seems to half the time taken. Perhaps going OT for SO here but I guess the flash has an optimum number of bytes to be written at once? Could anyone suggest how I discover this?
The image & card are 1Gb so I would very much like to return to the ~5 minutes dd took if possible. I appreciate that in all likelihood I won't match it.
开发者_如何学运维Rather than trial and error, would anyone be able to suggest a way to optimise the above code and reasoning as to why it works? Especially what value for input.read() for example?
One restriction: python 2.4.3 on linux (centos5) (please don't hurt me)
Speed depending on buffer size is unrelated to the specific characteristics of compact flash, but inherent to all I/O with (relatively) slow devices, even to all kinds of system calls. You should make the buffer size as large as possible without exhausting memory - 2MiB should be enough for a Flash drive.
You should use the time
and strace
utilities to determine why your program is slower. If time
shows a large user/real
(large meaning greater than 0.1
), you can optimize your Python interpreter - cpython 2.4 is pretty slow, and you're creating new objects all the time instead of writing into a preallocated buffer. If there is a significant difference in sys
timings, analyze the syscalls made by both programs (with strace
) and try to emit the ones dd
does.
Also note that you must call fsync
(or execute the sync
program) afterwards to measure the real time it took writing the file to disk (or open the output file with O_DIRECT
). Otherwise, the operating system will let your program exit and just keep all the written data in buffers that are then continually written out to the actual disk. To test that you're doing it right, remove the disk immediately after your program is finished. Note that the speed difference can be staggering. This effect is less noticeable if your disk(CF card) is way larger than the available physical memory.
So with a little help, I've removed the 'buffer' bit completely and added an os.fsync()
.
import os
input = open('filename.img', 'rb')
output = open("/dev/sdc", 'wb')
output.write(input.read())
input.close()
output.close()
outputfile.flush()
os.fsync(outputfile.fileno())
精彩评论