How to do recursive search of Folders & Files using producer/consumer queues?
I would like to search first for directory then the files inside it, for a keyword.
I know I need two classes, Producer class & Consumer class, but i do not know how to search using by c# producer/consumer queue?
public class Program
{
private static void Main()
{
Queue<File> searchFile = new Queue<File>();
Queue<Directory> searchDirectory = new Queue<Directory>();
new Thread(searchDirectory开发者_如何学Python).Start();
for (int i = 0; i < 3; i++)
new Thread(searchFile).Start();
}
}
Initial problems:
- You are declaring 2 variables of different types using the same variable name with the same scope.
- You do not want to start a threaded search on the Directory and another one on the File.
The problem with item number 2 is that you are working against one of the biggest bottlenecks with multiple threads - that is, Disk IO. You will gain nothing by performing disk IO (on a standard HDD device) by implementing more than 1 worker thread.
Explain more about what you are trying to do (with an example, please). There may be a better process.
First, Directory
is a static class so you will not be able to one to a collection. You will need to use DirectoryInfo
instead. Second, I would use a single queue which will hold DirectoryInfo
instances. The files can then be enumerated as part of processing a single folder.
Here is how I would do it using the producer-consumer pattern. This implemenation uses the BlockingCollection
class which as an implementation of a blocking queue. Blocking queues are quite useful in the producer-consumer pattern because they abstract nearly all of the producer-consumer details.
public class Searcher
{
private BlockingCollection<DirectoryInfo> m_Queue = new BlockingCollection<DirectoryInfo>();
public Searcher()
{
for (int i = 0; i < NUMBER_OF_THREADS; i++)
{
var thread = new Thread(Run);
thread.IsBackground = true;
thread.Start();
}
}
public void Search(DirectoryInfo root)
{
m_Queue.Add(root);
}
private void Run()
{
while (true)
{
// Wait for an item to appear in the queue.
DirectoryInfo root = m_Queue.Take();
// Add each child directory to the queue. This is the recursive part.
foreach (DirectoryInfo child in root.GetDirectories())
{
m_Queue.Add(child);
}
// Now we can enumerate each file in the directory.
foreach (FileInfo child in root.GetFiles())
{
// Add your search logic here.
}
}
}
}
I should point out that most disks work in a more serialized manner so having multiple threads attempting to search through files might not buy you a lot unless the CPU bound portion of your logic is extensive.
As other posters suggest, multiple threads trying to carry out IO will cause problems. However, they could be used to construct the full queue of direcotries (if it was very deep) and then a separate thread to do regex on file. A bit like this:
class Program
{
static void Main(string[] args)
{
ConcurrentQueue<DirectoryInfo> concurrentQueue = new ConcurrentQueue<DirectoryInfo>();
GetAllDirectories(new DirectoryInfo(@"C:\local\oracle"), concurrentQueue);
Action action = () =>{
const string toFind = "ora";
DirectoryInfo info;
while(concurrentQueue.TryDequeue(out info))
{
FindInFile(toFind, info);
}
};
Parallel.Invoke(action, action, action, action);
Console.WriteLine("total found " + _counter);
Console.ReadKey();
}
static int _counter = 0;
static void FindInFile(string textToFind,DirectoryInfo dirInfo)
{
var files =dirInfo.GetFiles();
foreach(FileInfo file in files)
{
using (StreamReader reader = new StreamReader(file.FullName))
{
string content = reader.ReadToEnd();
Match match = Regex.Match(content, textToFind, RegexOptions.Multiline);
if(match.Success)
{
Interlocked.Increment(ref _counter);
Console.WriteLine(file.FullName + " found " + match.Captures.Count);
foreach(var t in match.Captures)
{
Console.WriteLine("-------------> char index" + match.Index);
}
}
}
}
}
internal static void GetAllDirectories(DirectoryInfo root, ConcurrentQueue<DirectoryInfo> values)
{
foreach (var di in root.GetDirectories())
{
GetAllDirectories(di, values);
values.Enqueue(di);
}
}
}
I've edited the post (which is waiting peer review). If it does get approved, I've edited the code to fix the basic issue of scope and typos but I don't think you are ready for multi-threading let alone producer-consumer queues (God knows i've dabbled in multi-threading for a while and I still end up messing my implmentations but thats probably just me!).
You should first get comfortable with scopes and multi-threading. Especially read on locking mechanism / concurrency issues that are critical in implementing a successful multi-threaded solution.
Secondly as IAbstract suggests, indeed implement multiple threads with mutex / semaphores to gain performance wtih multi-threading as well as getting your desired producer-consumer queue.
Also if you are comfortable, you can also look at latest Async CTP1 DataFlow library which has latest support for this pattern using Tasks Parallel Library. Alternatively you can use BlockingCollection
to implment this pattern.
Stackoverflow also has questions revolving around your question with some excellent answers given. Just search "producer-consumer" to read them
精彩评论