开发者

Multithread file search C#

I need some help. Right now i have done a file search that will search my entire hard drive and it works. Here are the two methods that does it.

public void SearchFileRecursiveNonMultithreaded()
    {
        //Search files multiple drive

        string[] drives = Environment.GetLogicalDrives();

        foreach (string drive in drives)
        {
            if (GetDriveType(drive).ToString().CompareTo("DRIVE_FIXED") == 0)
            {
                DriveInfo driveInfo = new DriveInfo(drive);

                if (driveInfo.IsReady)
                {
                    System.IO.DirectoryInfo rootDirectory = driveInfo.RootDirectory;
                    RecursiveFileSearch(rootDirectory);
                }
            }
        }
        MessageBox.Show(files.Count.ToString());
    }

    public void RecursiveFileSearch(DirectoryInfo root)
    {
        DirectoryInfo[] subDirectory;
        try
        {
        //private List<FileInfo> files = new List<FileInfo>() is declared above
            files.AddRange(root.GetFiles(searchString.Text, SearchOption.TopDirectoryOnly));
        }
        catch (Exception)
        {
        }

        try
        {
            // Now find all the subdirectories u开发者_运维问答nder this directory.
            subDirectory = root.GetDirectories();

            foreach (System.IO.DirectoryInfo dirInfo in subDirectory)
            {
                // Resursive call will be performed for each subdirectory.
                RecursiveFileSearch(dirInfo);
            }
        }
        catch (Exception e)
        {
            MessageBox.Show(e.ToString());
        }
    }

Right now i am trying to implement a parallel search to make the search faster. I tried several procedures to get this to work. Tried to use backgroundworker as well as threads but have problems with it and it is very difficult to debug to know what is wrong ? Can someone let me know the approach to implement a parrallel search. The step will do i will go and figure out on my own. Any help provided will be greatly apperciated.


First, as somebody else pointed out, it's unlikely that using multiple threads will speed things up when you're searching just one drive. The vast majority of your time is spent waiting for the disk head to move to where it needs to be, and it can only be in one place at a time. Using multiple threads here is wasted effort, and has a high likelihood of actually making your program slower.

Second, you can simplify your code by just calling Directory.EnumerateFiles. If you want to search multiple drives concurrently, simply start multiple BackgroundWorker instances, each using EnumerateFiles to search a different drive.

Note, however, that EnumerateFiles will throw an exception (as will your code) if it runs across directory permissions problems, which aren't uncommon when searching an entire drive. If that's a problem (and it likely will be), then you have to write your own directory searcher. One such is in the answer to this question.


While searching logical drives simultaneously could help or hurt performance, here's how you might manage the threads:

    using System.Threading;
    ...

    string[] drives = Environment.GetLogicalDrives();
    List<Thread> threads = new List<Thread>();
    foreach (string drive in drives)
    {
        if (GetDriveType(drive).ToString().CompareTo("DRIVE_FIXED") == 0)
        {
            DriveInfo driveInfo = new DriveInfo(drive);

            if (driveInfo.IsReady)
            {
                System.IO.DirectoryInfo rootDirectory = driveInfo.RootDirectory;
                var thread = new Thread((dir) => RecursiveFileSearch((DirectoryInfo)dir));
                threads.Add(thread);
                thread.Start(rootDirectory);
            }
        }
    }
    foreach(var t in threads) t.Join();
    MessageBox.Show(files.Count.ToString());

Don't forget to lock any shared collection used by RecursiveFileSearch. You should try to avoid such access because it creates contention.


Your outer loop, foreach (string drive in drives) could gain from changing into a Parallel.ForEach().

Your inner loop (the RecursiveFileSearch() ) should not be made parallel, you'll just loose performance. But from Fx4 you can replace GetFiles() with EnumerateFiles() to get some better results on very large folders.

And that solves most of your tread-safety issues, the outer loop should provide a List for each drive to fill (non-async). Afterwards, merge those list after the ForEach().

The exact answer is more difficult: Searching Logical disks in parallel won't help much, the gains will be from independent 'axles'. But on a big RAID volume, searching the files could benefit from a few extra threads.


One solution to make it multi-threaded is to dump each call to RecursiveFileSearch into ThreadPool.QueueUserWorkItem to have it run on multiple threads.

Now, be cautioned with this approach for the following reasons:

1) As Dypple stated, accessing the drive is single threaded so this really could hurt performance

2) List is not threadsafe so you would need to do a lock/synchronize on it before adding to the list. This could also hurt performance alot. Consider using System.Collections.Concurrent.ConcurrentBag (in .NET 4.0) to have it control synchronoziation for you since you are just doing additions.

3) Adding every file you encounter to the list can result in an overflow if you have greater then MaxIntFiles.

4) This File collection could become huge and may result in an out of memory exception.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜