开发者

finding many thousands of files in a directory pattern in Perl

I would like to find a file pattern on a directory pattern in Perl that will return many thousands of entr开发者_开发问答ies, like this:

find ~/mydir/*/??/???/???? -name "\*.$refinfilebase.search" -print

I've been told there are different ways to handle it? I.e.:

File::Find
glob()
opendir, readdir, grep
Diamond operator, e.g.: my @files = <$refinfilebase.search>

Which one would be most adequate to be able to run the script on older versions of Perl or minimal installations of Perl?


For very large directories, opendir() is probably safest, as it doesn't need to read everything in or do any filtering on it. This can be faster as the ordering isn't important, and on very large directories, on some operating systems, this can be a performance hit. opendir is also built-in with all systems.

Note the actual way it behaves may be different on different platforms. So you need to be careful in coding with it. This mainly affects which it returns for things like the parent and current directory, which you may need to treat specially.

glob() is more useful when you only want some files, matching by a pattern. File::Find is more useful when recursing through a set of nested directories. If you don't need either, opendir() is a good base.


Also you have DirHandle

DirHandle:

use DirHandle;
$d = new DirHandle ".";
if (defined $d) {
    while (defined($_ = $d->read)) { something($_); }
    $d->rewind;
    while (defined($_ = $d->read)) { something_else($_); }
    undef $d;
}

For use cases of readdir and glob see What reasons are there to prefer glob over readdir (or vice-versa) in Perl?

I prefer to use glob for quickly grab a list of files in a dir (no subdirs) and process them like

map{process_bam($_)} glob(bam_files/*.bam)

This is more convenient because it does not take the . and .. even is you ask for (*) and also returns the full path if you use a dir in the glob pattern.

Also you can use glob quickly as a oneliner piped to xargs or in a bash for loop when you need to preprocess the filenames of the list:

perl -lE 'print join("\n", map {s/srf\/(.+).srf/$1/;$_} glob("srf/198*.srf"))' | xargs -n 1.....

Readdir has adventages in other scenarios so you need to use the one that fits better for your actions.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜