Python: os.stat().st_size gives different value than du
I'm creating a utility that will walk through directories and get the sizes of child directories and files for all directories and store the value. However, the sizes aren't computed correctly.
Here's my class, which automatically recurses through all sub-directories:
class directory:
'''
Class that automatically traverses directories
and builds a tree with size info
'''
def __init__(self, path, parent=None):
if path[-1] != '/':
# Add trailing /
self.path = path + '/'
else:
self.path = path
self.size = 4096
self.parent = parent
self.children = []
self.errors = []
for i in os.listdir(self.path):
try:
self.size += os.lstat(self.path + i).st_size
if开发者_如何学Python os.path.isdir(self.path + i) and not os.path.islink(self.path + i):
a = directory(self.path + i, self)
self.size += a.size
self.children.append(a)
except OSError:
self.errors.append(path + i)
I have a directory of videos that I'm testing this program with:
>>> a = directory('/var/media/television/The Wire')
>>> a.size
45289964053
However, when I try the same with du, I get
$ du -sx /var/media/television/The\ Wire
44228824
The directories don't contain any links or anything special.
Could someone explain why os.stat()
is giving weird size readings?
Platform:
- Linux (Fedora 13)
- Python 2.7
Consider this file foo
-rw-rw-r-- 1 unutbu unutbu 25334 2010-10-31 12:55 foo
It consists of 25334 bytes.
tune2fs tells me foo resides on a filesystem with block size 4096 bytes:
% sudo tune2fs -l /dev/mapper/vg1-OS1
...
Block size: 4096
...
Thus, the smallest file on the filesystem will occupy 4096 bytes, even if its contents consist of just 1 byte. As the file grows larger, space is allocated in 4096-byte blocks.
du reports
% du -B1 foo
28672 foo
Note that 28672/4096 = 7. This is saying that foo occupys 7 4096-byte blocks on the filesystem. This is the smallest number of blocks needed to hold 25334 bytes.
% du foo
28 foo
This version of du
is just reporting 28672/1024 rounded down.
du
gives the size on disk by default, versus the actual file size as given in st_size
.
$ du test.txt
8 test.txt
$ du -b test.txt
6095 test.txt
>>> os.stat('test.txt').st_size
6095
I would write this code as:
import os, os.path
def size_dir(d):
file_walker = (
os.path.join(root, f)
for root, _, files in os.walk(d)
for f in files
)
return sum(os.path.getsize(f) for f in file_walker)
If you want to count directories as 4k, then do something like this:
import os, os.path
def size_dir(d):
file_walker = (
os.path.join(root, f)
for root, _, files in os.walk(d)
for f in files
)
dir_walker = (
4096
for root, dirs, _ in os.walk(d)
for d in dirs
)
return 4096 + sum(os.path.getsize(f) for f in file_walker) + sum(size for size in dir_walker)
On linux (I am using CentOS), 'du -b' will return in bytes and will activate --apparent-size thus returning the size of the file rather than the amount of disk space it is using. Try that and see if that agrees with what Python os.stat
says.
精彩评论