开发者

Fast (lowlevel) method to recursively process files in folders

My application indexes contents of all hard drives on end users computers. I am using Directory.GetFiles and Directory.GetDirectories to recursively process the whole folder structure. I am indexing only a few selected file types (up to 10 filetypes).

I am seeing 开发者_StackOverflow中文版in profiler that most of the indexing time is spent in enumerating files and folders - depending on ratio of files that will actually be indexed up to 90 percent of time.

I would like to make the indexing as fast as possible. I have already optimized the indexing itself and processing of the indexed files.

I was thinking using Win32 API calls, but I am actually seeing in the profiler that most of the processing time is actually spent on these API calls done by .NET.

Is there a (possibly low level) method accessible from C# that would make enumeration of files/folders at least partially faster?


As requested in the comment, my current code (just a scheme with irrelevant parts trimmed):

    private IEnumerable<IndexedEntity> RecurseFolder(string indexedFolder)
    {
        //for a single extension:
        string[] files = Directory.GetFiles(indexedFolder, extensionFilter);
        foreach (string file in files)
        {
            yield return ProcessFile(file);
        }
        foreach (string directory in Directory.GetDirectories(indexedFolder))
        {
            //recursively process all subdirectories
            foreach (var ie in RecurseFolder(directory))
            {
                yield return ie;
            }
        }
    }


In .NET 4.0, there are inbuilt enumerable file listing methods; since this isn't far away, I would try using that. This might be a factor in particular if you have any folders that are massively populated (requiring a large array allocation).

If depth is the issue, I would consider flattening your method to use a local stack/queue and a single iterator block. This will reduce the code path used to enumerate the deep folders:

    private static IEnumerable<string> WalkFiles(string path, string filter)
    {
        var pending = new Queue<string>();
        pending.Enqueue(path);
        string[] tmp;
        while (pending.Count > 0)
        {
            path = pending.Dequeue();
            tmp = Directory.GetFiles(path, filter);
            for(int i = 0 ; i < tmp.Length ; i++) {
                yield return tmp[i];
            }
            tmp = Directory.GetDirectories(path);
            for (int i = 0; i < tmp.Length; i++) {
                pending.Enqueue(tmp[i]);
            }
        }
    }

Iterate that, creating your ProcessFiles from the results.


If you believe that the .NET implementation is causing the problem then I suggest that you use the winapi calls _findfirst, _findnext etc.

It seems to me that .NET requires a lot of memory for because the lists are completely copied into the arrays at each level of directory - so if your directory structure is 10 levels deep you have 10 versions of the array files at any given moment and an allocation/deallocation of this array for every directory in the structure.

Using the same recursive technique with _findfirst etc will only require that handles to a position in the directory structure be kept at every level of recursion.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜