开发者

Fastest Way To Calculate Directory Sizes

what is the best and fastest way to calculate directory sizes? For example we will have the following structure:

/users
      /a
      /b
      /c
      /...

We need the output to be per user directory:

a = 1224KB
b = 3533KB
c = 3324KB
...

We plan on having tens maybe even hundred of thousands of directories under /users. The following shell command works:

du -cms /users/a | grep total | awk '{print $1}'

But, we will have to call it N number of times. The entire point, is that the output; each users directory size will be stored in our database. Also, we would love to have it update as frequently as possible, but without blocking all the resources on the server. Is it even possible to have it开发者_Go百科 calculate users directory size every minute? How about every 5 minutes?

Now that I am thinking about it some more, would it make sense to use node.js? That way, we can calculate directory sizes, and even insert into the database all in one transaction. We could do that as well in PHP and Python, but not sure it is as fast.

Thanks.


Why not just:

du -sm /users/*

(The slowest part is still likely to be du traversing the filesystem to calculate the size, though).


What do you need this information for? If it's only for reminding the users that their home directories are too big, you should add quota limits to the filesystem. You can set the quota to 1000 GB if you just want the numbers without really limiting disk usage.

The numbers are usually accurate whenever you access anything on the disk. The only downside is that they tell you how large the files are that are owned by a particular user, instead of how large the files below his home directory are. But maybe you can live with that.


I think what you are looking for is:

du -cm --max-depth=1 /users | awk '{user = substr($2,7,300);
>                                   ans = user ": " $1;
>                                   print ans}'

The magic numbers 7 is taking away the substring /users/, and 300 is just an arbitrary big number (awk is not one of my best languages =D, but I am guessing that part is not going to be written in awk anyways.) It's faster since you don't involve greping for the total and the loop is contained inside du. I bet it can be done faster, but this should be fast enough.


If you have multiple cores you can run the du command in parallel,

For example (running from the folder you want to examine):

>> parallel du -sm ::: *

>> ls -a | xargs -P4 du -sm

[The number after the -P argument sets the amount of cpus you want to use]


not that slow but will show you folders size: du -sh /* > total.size.files.txt


Fastest way for analyze storage using ncdu package:

sudo apt-get install ncdu

command example:

ncdu /your/directory/
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜