开发者

Get file size during os.walk

I am using os.walk to compare two folders, and see if they contain the exact same files. However, this only ch开发者_运维问答ecks the file names. I want to ensure the file sizes are the same, and if they're different report back. Can you get the file size from os.walk?


The same way you get file size without using os.walk, with os.stat. You just need to remember to join with the root:

for root, dirs, files in os.walk(some_directory):
    for fn in files:
        path = os.path.join(root, fn)
        size = os.stat(path).st_size # in bytes

        # ...


os.path.getsize(path) can give you the filesize of the file, but having two files the same size does not always mean they are identical. You could read the content of the file and have an MD5 or Hash of it to compare against.


As others have said: you can get the size with stat. However for doing comparisons between dirs you can use dircmp.


FYI, there is a more efficient solution in Python 3:

import os

with os.scandir(rootdir) as it:
    for entry in it:
        if entry.is_file():
            filepath = entry.path # absolute path
            filesize = entry.stat().st_size

See os.DirEntry for more details about the variable entry.

Note that the above is not recursive (subfolders will not be explored). In order to get an os.walk-like behaviour, you might want to use the following:

from collections import namedtuple
from os.path import normpath, realpath
from os.path import join as pathjoin

_wrap_entry = namedtuple( 'DirEntryWrapper', 'name path islink size' )
def scantree( rootdir, follow_links=False, reldir='' ):
    visited = set()
    rootdir = normpath(rootdir)
    with os.scandir(rootdir) as it:
        for entry in it:
            if entry.is_dir():
                if not entry.is_symlink() or follow_links:
                    absdir = realpath(entry.path)
                    if absdir in visited: 
                        continue 
                    else: 
                        visited.add(absdir)
                    yield from scantree( entry.path, follow_links, pathjoin(reldir,entry.name) )
            else:
                yield _wrap_entry( 
                    pathjoin(reldir,entry.name), 
                    entry.path, 
                    entry.is_symlink(),
                    entry.stat().st_size )

and use it as

for entry in scantree(rootdir, follow_links=False):
    filepath = entry.path 
    filesize = entry.size
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜