Python x64 bit on Windows x64 copy file performance evaluation / problem

2023-01-27 03:40 问答作者：

when programming a kind of backup application, I did an evaluation of file copying performance on Windows.

I have several questions and I wonder about your opinions.

Thank you!

Lucas.

Questions:

Why is the performance so much slower when copying the 10 GiB file compared to the 1 GiB file?
Why is shutil.copyf开发者_StackOverflow中文版ile so slow?
Why is win32file.CopyFileEx so slow? Could this be because of the flag win32file.COPY_FILE_RESTARTABLE? However, it doesn't accept the int 1000 as flag (COPY_FILE_NO_BUFFERING), which is recommended for large files: http://msdn.microsoft.com/en-us/library/aa363852%28VS.85%29.aspx
Using an empty ProgressRoutine seems to have no impact over using no ProgressRoutine at all.
Is there an alternative, better-performing way of copying the files but also getting progress updates?

Results for a 1 GiB and a 10 GiB file:

test_file_size             1082.1 MiB    10216.7 MiB

METHOD                      SPEED           SPEED
robocopy.exe                111.0 MiB/s     75.4 MiB/s
cmd.exe /c copy              95.5 MiB/s     60.5 MiB/s
shutil.copyfile              51.0 MiB/s     29.4 MiB/s
win32api.CopyFile           104.8 MiB/s     74.2 MiB/s
win32file.CopyFile          108.2 MiB/s     73.4 MiB/s
win32file.CopyFileEx A       14.0 MiB/s     13.8 MiB/s
win32file.CopyFileEx B       14.6 MiB/s     14.9 MiB/s

Test Environment:

Python:
ActivePython 2.7.0.2 (ActiveState Software Inc.) based on
Python 2.7 (r27:82500, Aug 23 2010, 17:17:51) [MSC v.1500 64 bit (AMD64)] on win32

source = mounted network drive
source_os = Windows Server 2008 x64

destination = local drive
destination_os = Windows Server 2008 R2 x64

Notes:

'robocopy.exe' and 'cmd.exe /c copy' were run using subprocess.call()

win32file.CopyFileEx A (using no ProgressRoutine):

def Win32_CopyFileEx_NoProgress( ExistingFileName, NewFileName):
    win32file.CopyFileEx(
        ExistingFileName,                             # PyUNICODE           | File to be copied
        NewFileName,                                  # PyUNICODE           | Place to which it will be copied
        None,                                         # CopyProgressRoutine | A python function that receives progress updates, can be None
        Data = None,                                  # object              | An arbitrary object to be passed to the callback function
        Cancel = False,                               # boolean             | Pass True to cancel a restartable copy that was previously interrupted
        CopyFlags = win32file.COPY_FILE_RESTARTABLE,  # int                 | Combination of COPY_FILE_* flags
        Transaction = None                            # PyHANDLE            | Handle to a transaction as returned by win32transaction::CreateTransaction
        )

win32file.CopyFileEx B (using empty ProgressRoutine):

def Win32_CopyFileEx( ExistingFileName, NewFileName):
    win32file.CopyFileEx(
        ExistingFileName,                             # PyUNICODE           | File to be copied
        NewFileName,                                  # PyUNICODE           | Place to which it will be copied
        Win32_CopyFileEx_ProgressRoutine,             # CopyProgressRoutine | A python function that receives progress updates, can be None
        Data = None,                                  # object              | An arbitrary object to be passed to the callback function
        Cancel = False,                               # boolean             | Pass True to cancel a restartable copy that was previously interrupted
        CopyFlags = win32file.COPY_FILE_RESTARTABLE,  # int                 | Combination of COPY_FILE_* flags
        Transaction = None                            # PyHANDLE            | Handle to a transaction as returned by win32transaction::CreateTransaction
        )

def Win32_CopyFileEx_ProgressRoutine(
    TotalFileSize,
    TotalBytesTransferred,
    StreamSize,
    StreamBytesTransferred,
    StreamNumber,
    CallbackReason,                         # CALLBACK_CHUNK_FINISHED or CALLBACK_STREAM_SWITCH
    SourceFile,
    DestinationFile,
    Data):                                  # Description
    return win32file.PROGRESS_CONTINUE      # return of any win32file.PROGRESS_* constant

Question 3:

You are misinterpreting the COPY_FILE_NO_BUFFERING flag in Microsofts API. It is not int 1000 but hex 1000 (0x1000 => int value: 4096). When you set CopyFlags = 4096 you will have the (?) fastest copy routine in a Windows environment. I am using the same routine in my data backup code which is very fast and transfers terabyte sized data day to day.

Question 4:

It doesn't matter as it is a callback. But overall you should not put too much code inside and keep it clean and slick.

Question 5:

In my experience it is the fastest possible copy routine in a standard Windows environment. There might be faster custom copy routines, but when using plain Windows API nothing better can be found.

In all likelihood, because you're measuring the completion time differently.

I'm guessing that a 1Gb file fits in ram comfortably. Therefore the OS is probably just caching it and telling your application it's copied when most of it (perhaps all) is still unflushed in the kernel buffers.

However, the 10G file doesn't fit in ram, so it must write (most of) it before it says it's finished.

If you want a meaningful measurement,

a) Clear the filesystem buffer cache before each run - if your OS doesn't provide a convenient way of doing this, reboot (NB: Windows does not provide a convenient method, I think there is a systems internals tool which does this though). In the case of a network filesystem clear the cache on the server too.

b) Sync the file to disc after you've finished, before you measure the completion time

Then I expect you'll see more consistent times.

To answer your question 2.:

shutil.copyfile() is so slow, because by default it uses a 16Kbyte copy buffer. Eventually it ends up in shutil.copyfileobj(), which looks like this:

def copyfileobj(fsrc, fdst, length=16*1024):
    """copy data from file-like object fsrc to file-like object fdst"""
    while 1:
        buf = fsrc.read(length)
        if not buf:
            break
        fdst.write(buf)

In your case it's ping-ponging between reading 16K and writing 16K. If you were to use copyfileobj() directly on your GB file, but with a buffer of 128MB for example, you would see drastically improved performance.

Lucas, I find the following way works ~20% faster than win32file.CopyFile.

b = bytearray(8 * 1024 * 1024) 
# I find 8-16MB is the best for me, you try to can increase it 
with io.open(f_src, "rb") as in_file:
    with io.open(f_dest, "wb") as out_file:
        while True:
            numread = in_file.readinto(b)
            if not numread:
                break
            out_file.write(b)
            # status bar update here
shutil.copymode(f_src, f_dest)

继续阅读：copy performance python win32com windows

Python x64 bit on Windows x64 copy file performance evaluation / problem

Question 3:

Question 4:

Question 5:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Question 3:

Question 4:

Question 5:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？