Multiple Threads to load xml files into memory
I have a set of XML files that I want to load into memory in order to process.
I am loading the files into a Collection and it seems that it is a lot faster if I load the files in a single thread rather than using the thread pool.
I would have thought this would have been the other way around.
Why is it the case that use 开发者_如何学Pythonmultiple threads to load files into memory is significantly slower than if I just iterate through the file list and load each file one after another on a single thread?
This is with C# .net 3.5
The code:
ICollection<XmlDocument> xmlFilesToProcess = new Collection<XmlDocument>();
foreach (FileInfo fileInfo in fileList)
{
ThreadPool.QueueUserWorkItem(
(o) =>
{
XmlDocument doc = new XmlDocument();
doc.Load((string)o);
lock (xmlFilesToProcess)
{
xmlFilesToProcess.Add(doc);
counter++;
}
}, fileInfo.FullName);
}
Without seeing the code, its hard to tell. If the size and/or number of XML is small and you only have one CPU then it could be simply that the context switching between threads is taking more time than is required to simply read the files.
EDIT
Now that I see the code you are creating way too many threads. I suggest you use the Parallel.For of the TPL. This is available for .Net 3.5
See http://msdn.microsoft.com/en-us/magazine/cc163340.aspx for more info on TPL.
Without seeing the code, I would guess it probably has to do with the fact that reading from disk is the slow part of the operation. Since the disk can really only read one file at a time the disk becomes the bottleneck.
Whenever you need to make a decision on multi-threading vs single-threading you need to benchmark, ideally on a machine that is going to run your application.
Multi-threaded code can be slower, because of extra-overhead on thread synchronization. Even if you use ThreadPool, there will be initial overhead of thread creation.
It is difficult to suggest what is better single or multi threading without knowing the details of the problem to solve.
Also, it is difficult to tell why one code is slower than another without seeing the code.
精彩评论