Shell: list directories ordered by file count (including in subdirectories)
I've nearly reached my limit for the permitted number of files in my Linux home directory, and I'm curious about where all the files are.
In any directory I can use for example find . -type f | wc -l
to show a count of how many files are in that directory and in its subdirectories, but what I'd like is to be able to generate a complete list of all subdirectories (and sub-subdirectories etc) each with a count of all files contained in it and its su开发者_如何学运维bdirectories - if possible ranked by count, descending.
Eg if my file structure looks like this:
Home/
file1.txt
file2.txt
Docs/
file3.txt
Notes/
file4.txt
file5.txt
Queries/
file6.txt
Photos/
file7.jpg
The output would be something like this:
7 Home
4 Home/Docs
2 Home/Docs/Notes
1 Home/Docs/Queries
1 Home/Photos
Any suggestions greatly appreciated. (Also a quick explanation of the answer, so I can learn from this!). Thanks.
I use the following command
find . -xdev -type f | cut -d "/" -f 2 | sort | uniq -c | sort -n
Which produces something like:
[root@ip-***-***-***-*** /]# find . -xdev -type f | cut -d "/" -f 2 | sort | uniq -c | sort -n
1 .autofsck
1 stat-nginx-access
1 stat-nginx-error
2 tmp
14 boot
88 bin
163 sbin
291 lib64
597 etc
841 opt
1169 root
2900 lib
7634 home
42479 usr
80964 var
This should work:
find ~ -type d -exec sh -c "fc=\$(find '{}' -type f | wc -l); echo -e \"\$fc\t{}\"" \; | sort -nr
Explanation: In the command above will run "find ~ -type d" to find all the sub-directories the home-directory. For each of them, it runs a short shell script that finds the total number of files in that sub-directory (using the "find $dir -type f | wc -l" command that you already know), and will echo the number followed by the directory name. The sort command then runs to sort by the total number of files, in a descending order.
This is not the most efficient solution (you end up scanning the same directory many times), but I am not sure you can do much better with a one liner :-)
countFiles () {
# call the recursive function, throw away stdout and send stderr to stdout
# then sort numerically
countFiles_rec "$1" 2>&1 >/dev/null | sort -nr
}
countFiles_rec () {
local -i nfiles
dir="$1"
# count the number of files in this directory only
nfiles=$(find "$dir" -mindepth 1 -maxdepth 1 -type f -print | wc -l)
# loop over the subdirectories of this directory
while IFS= read -r subdir; do
# invoke the recursive function for each one
# save the output in the positional parameters
set -- $(countFiles_rec "$subdir")
# accumulate the number of files found under the subdirectory
(( nfiles += $1 ))
done < <(find "$dir" -mindepth 1 -maxdepth 1 -type d -print)
# print the number of files here, to both stdout and stderr
printf "%d %s\n" $nfiles "$dir" | tee /dev/stderr
}
countFiles Home
produces
7 Home
4 Home/Docs
2 Home/Docs/Notes
1 Home/Photos
1 Home/Docs/Queries
simpler and more efficient:
find ~ -type f -exec dirname {} \; | sort | uniq -c | sort -nr
find . -type d -exec sh -c '(echo -n "{} "; ls {} | wc -l)' \; | sort -n -k 2
This is pretty efficient.
It will display the counts in ascending order (i.e. largest at the end). To get it is descending order, add the "-r" option to "sort".
If you run this command in the "/" directory, it will scan the entire filesystem and tell you what are the directories that contain the most files and sub-directories. It's a good way to see where all your inodes are being used.
Note: this will not work for directories that contain spaces, but you could modify it to work in that case, if it's a problem for you.
See following example: sort by column 2 reversely. Use sort -k 2 -r
. -k 2 means sort with column 2 (space separated), -r means reverse.
# ls -lF /mnt/sda1/var/lib/docker/165536.165536/aufs/mnt/ | sort -k 2 -r
total 972
drwxr-xr-x 65 165536 165536 4096 Jun 5 12:23 ad45ea3c6a03aa958adaa4d5ad6fc25d31778961266972a69291d3664e3f4d37/
drwxr-xr-x 19 165536 165536 4096 Jun 6 06:46 7fa7f957669da82a8750e432f034be6f0a9a7f5afc0a242bb00eb8024f77d683/
drwxr-xr-x 2 165536 165536 4096 May 8 02:20 49e067ffea226cfebc8b95410e90c4bad6a0e9bc711562dd5f98b7d755fe6efb/
drwxr-xr-x 2 165536 165536 4096 May 8 01:19 45ec026dd49c188c68b55dcf98fda27d1f9dd32f825035d94849b91c433b6dd3/
drwxr-xr-x 2 165536 165536 4096 Mar 13 06:08 0d6e95d4605ab34d1454de99e38af59a267960999f408f720d0299ef8d90046e/
drwxr-xr-x 2 165536 165536 4096 Mar 13 02:25 e9b252980cd573c78065e8bfe1d22f01b7ba761cc63d3dbad284f5d31379865a/
drwxr-xr-x 2 165536 165536 4096 Mar 13 02:24 f4aa333b9c208b18faf00b00da150b242a7a601693197c1f1ca78b9ab2403409/
drwxr-xr-x 2 165536 165536 4096 Mar 13 02:24 3946669d530695da2837b2b5ed43afa11addc25232b29cc085a19c769425b36b/
drwxr-xr-x 2 165536 165536 4096 Mar 11 11:11 44293f77f63806a58d9b97c3c9f7f1397b6f0935e236250e24c9af4a73b3e35b/
If however you are fine with the non cumulative solution by using dirname (see answer of wjb) then by far more efficient is:
find ~ -type f -print0 | xargs -0 dirname | sort | uniq -c | sort -n
Note that this does not display empty dirs. For that you may do find ~ -type d -empty if your version of find supports it.
精彩评论