开发者

What is the fastest / easiest way to count large number of files in a directory (in Linux)?

I had some directory, with large number of files. Every time I tried to access the list of files within it, I was not able to do that or there was significant delay. I was trying to use ls command within command-line on Linux and web interface from my hosting provider did not help also.

The problem is, that when I just do ls, it takes significant amount of time to even start displaying something. Thus, ls | wc -l would no开发者_运维问答t help also.

After some research I came up with this code (in this example it counts number of new emails on some server):

print sum([len(files) for (root, dirs, files) in walk('/home/myname/Maildir/new')])

The above code is written in Python. I used Python's command-line tool and it worked pretty fast (returned result instantly).

I am interested in the answer to the following question: is it possible to count files in a directory (without subdirectories) faster? What is the fastest way to do that?


ls does a stat(2) call for every file. Other tools, like find(1) and the shell wildcard expansion, may avoid this call and just do readdir. One shell command combination that might work is find dir -maxdepth 1|wc -l, but it will gladly list the directory itself and miscount any filename with a newline in it.

From Python, the straight forward way to get just these names is os.listdir(directory). Unlike os.walk and os.path.walk, it does not need to recurse, check file types, or make further Python function calls.

Addendum: It seems ls doesn't always stat. At least on my GNU system, it can do only a getdents call when further information (such as which names are directories) is not requested. getdents is the underlying system call used to implement readdir in GNU/Linux.

Addition 2: One reason for a delay before ls outputs results is that it sorts and tabulates. ls -U1 may avoid this.


This should be pretty fast in Python:

from os import listdir
from os.path import isfile, join
directory = '/home/myname/Maildir/new'
print sum(1 for entry in listdir(directory) if isfile(join(directory,entry)))


Total number of files in the given directory

find . -maxdepth 1 -type f | wc -l

Total number of files in the given directory and all subdirectories under it

find . -type f | wc -l

For more details drop into a terminal and do man find


I'm not sure about speed, but if you want to just use shell builtins this should work:

#!/bin/sh
COUNT=0;
for file in /path/to/directory/*
do
COUNT=$(($COUNT+1));
done
echo $COUNT


I think ls is spending most of its time before displaying the first line because it has to sort the entries, so ls -U should display the first line much faster (though it may not be that much better in total).


The fastest way would be to avoid all the overhead of interpreted languages and write some code that directly addresses your problem. Doing so is difficult to do in a portable way, but pretty straightforward. At the moment I'm on an OS X box, but converting the following to Linux should be extremely straightforward. (I opted to ignore hidden files and only count regular files...modify as necessary or add command line switches to get the functionality you want.)


#include <dirent.h>
#include <stdio.h>
#include <stdlib.h>

int
main( int argc, char **argv )
{
    DIR *d;
    struct dirent *f;
    int count = 0;
    char *path = argv[ 1 ];

    if( path == NULL ) {
        fprintf( stderr, "usage: %s path", argv[ 0 ]);
        exit( EXIT_FAILURE );
    }
    d = opendir( path );
    if( d == NULL ) { perror( path );exit( EXIT_FAILURE ); }
    while( ( f = readdir( d ) ) != NULL ) {
        if( f->d_name[ 0 ] != '.'  &&  f->d_type == DT_REG )
            count += 1;
    }
    printf( "%d\n", count );
    return EXIT_SUCCESS;
}


My use case is a linux SBC (Banana Pi) counting files in a directory on a FAT32 USB stick. In a shell, doing

ls -U {dir} | wc -l

takes 6.4secs with 32k files in there (32k = max files/dir on FAT32) From python doing

t=time.time() ; print len(os.listdir(d)) ; print time.time()-t

takes only 0.874secs(!) Can't see anything else in Python being quicker than that.


A shorter way of counting files in a directory in bash:

files=(*) ; echo ${#files[@]}

I generate 10_000 empty files in tmpfs; it takes 0.03s on my machine to count them, running ls | wc -l was just slightly slower (I flushed the cache before and in between just in case)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜