开发者

C Sharp Folder Search by Using Regular Expression

What is the most efficient way to get a list of folders from a top level directory that match a certain regular expression? I am currently just recursively iterating over the subfolders to see if they match the regular expression, then if they do, I am grabbing the file name with the directory path.

Currently this search is taking approximately 50 minutes by using the current method due to the amount of folders located in this directory.

private void ProcessFiles(string path, string searchPattern)
{
    string pattern = @"^(\\\\server\\folder1\\subfolder\\(MENS|WOMENS|MENS\sDROPBOX|WOMENS\sDROPBOX)\\((((COLOR\sCHIPS)|(ALL\sMENS\sCOLORS)))|((\d{4})\\(\w+)\\(FINAL\sART|FINAL\sARTWORK)\\(\d{3}))))$";
    DirectoryInfo di = new DirectoryInfo(path);
    try
    {
        Debug.WriteLine("I'm in " + di.FullName);
        if (di.Exists)
        {
            DirectoryInfo[] dirs = di.GetDirectories("*", SearchOption.TopDirectoryOnly);
            foreach (DirectoryInfo d in dirs)
            {
                string[] splitPath = d.FullName.Split('\\');


                var dirMatch = new Regex(pattern, RegexOptions.IgnoreCase);

                if (dirMatch.IsMatch(d.FullName))
                {
       开发者_如何学JAVA             Debug.WriteLine("---Processing Directory: " + d.FullName + " ---");
                    FileInfo[] files = d.GetFiles(searchPattern, SearchOption.TopDirectoryOnly);
                    AddColor(files, splitPath);
                }
                ProcessFiles(d.FullName, searchPattern);
            }
        }


    }
    catch (Exception e)
    {

    }

}


I would use something like the following, no need for recursion, let the BCL do that for you:

// I didn't recount the parenetheses...
Regex re = new Regex("MENS|WOMENS|MENS\sDROPBOX|WOMENS\sDROPBOX)\\((((COLOR\sCHIPS)|(ALL\sMENS\sCOLORS)))|((\d{4})\\(\w+)\\(FINAL\sART|FINAL\sARTWORK)\\(\d{3})))");
var dirs = from dir in 
           Directory.EnumerateDirectories(dirPath, "dv_*",
           SearchOption.AllDirectories)
           where re.IsMatch(dir)
           select dir;

If it still runs 50 minutes, you're just on a slow drive, a network or similar.

EDIT: you edited your question. It clearly shows you're running your code on an UNC path. This is extremely slow, if you need speed, run it on that server itself.

Note: there's a big difference between behavior of GetDirectories (that you use) and EnumerateDirectories. Microsoft's documentation says this about it:

The EnumerateDirectories and GetDirectories methods differ as follows: When you use EnumerateDirectories, you can start enumerating the collection of names before the whole collection is returned; when you use GetDirectories, you must wait for the whole array of names to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateDirectories can be more efficient.

In regards to your question: it will go through all directories it has access to, don't let it start on a directory you don't have access to (it will raise an exception).


To get the fastest results on a directory tree imo the best way is to use interop. FindFirstFile, FindNextFile, FindClose are your friends.

http://msdn.microsoft.com/en-us/library/aa364418%28v=vs.85%29.aspx

But don't expect the speed of light if you have a huge tree to traverse.


You could recursively launch additional threads on subfolders to try to leverage any parallel capabilities your system has, but odds are that the majority of the overhead is probably disk access.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜