C++ map performance - Linux (30 sec) vs Windows (30 mins) !
I need to process a list of files. The processing action should not be repeated for the same file. The code I am using for this is -
using namespace std;
vector<File*> gInputFileList; //Can contain duplicates, File has member sFilename
map<string, File*> gProcessedFileList; //Using map to avoid line开发者_开发知识库ar search costs
void processFile(File* pFile)
{
File* pProcessedFile = gProcessedFileList[pFile->sFilename];
if(pProcessedFile != NULL)
return; //Already processed
foo(pFile); //foo() is the action to do for each file
gProcessedFileList[pFile->sFilename] = pFile;
}
void main()
{
size_t n= gInputFileList.size(); //Using array syntax (iterator syntax also gives identical performance)
for(size_t i=0; i<n; i++){
processFile(gInputFileList[i]);
}
}
The code works correctly, but...
My problem is that when the input size is 1000, it takes 30 minutes - HALF AN HOUR - on Windows/Visual Studio 2008 Express. For the same input, it takes only 40 seconds to run on Linux/gcc!
What could be the problem? The action foo() takes only a very short time to execute, when used separately. Should I be using something like vector::reserve for the map?
EDIT, EXTRA INFORMATION
What foo() does is: 1. it opens the file 2. reads it into memory 3. closes the file 4. the contents of the file in memory is parsed 5. it builds a list of tokens; I'm using a vector for that.
Whenever I break the program (while running the program with the 1000+ files input set): the call-stack shows that the program is in the middle of a std::vector add.
In the Microsoft Visual Studio, there's a global lock when accessing the Standard C++ Library to protect from multi threading issue in Debug builds. This can cause big performance hits. For instance, our full test code runs on Linux/gcc in 50 minutes, whereas it needs 5 hours on Windows VC++2008. Note that this performance hit does not exist when compiling in Release mode, using the non-debug Visual C++ runtime.
I would approach it like any performance problem. This means: profiling. MSVC has a built-in profiler, by the way, so it may be a good chance to get familiar with it.
Break into the program using the debugger at a random time, and the chances are very high that the stack trace will tell you where it's spending the time.
I very very strongly doubt that your performance problem is coming from the STL containers.
Try to eliminate (comment out) the call to foo(pFile)
or any other method which touches the filesystem. Although running foo(pFile)
once may appear fast, running it on 1000 different files (especially on Windows filesystems, in my experience) could turn out to be much slower (e.g. because of filesystem cache behaviour.)
EDIT
Your initial post was claiming that BOTH debug and release builds were affected. Now you are withdrawing that claim.
Be aware that in DEBUG builds:
- the STL implementation performs extra checks and assertions
- heap operations (memory allocation etc.) perform extra checks and assertions; moreover, under debug builds the low-fragmentation heap is disabled (up to a 10x overall slowdown in memory allocation)
- no code optimizations are performed, which may result in further STL performance degradation (STL relying many a time heavily on inlining, loop unwinding etc.)
With 1000 iterations you are probably not affected by the above (not at the outer loop level at least) unless you use STL/the heap heavily INSIDE foo().
I would be astounded if the performance issues you are seeing have anything at all to do with the map
class. Doing 1000 lookups and 1000 insertion should take a combined time on the order of microseconds. What is foo()
doing?
Without knowing how the rest of the code fits in, I think the overall idea of caching processed files is a little flaky.
Try removing duplicates from your vector first, then process them all.
Try commenting each block or major operation to determine which part actually caused the difference in execution time in Linux and Windows. I also don't think it would be because of the STL map. The problem may be inside foo(). It may be in some file operation as it is the only thing I could think of that would be costly in this case.
You may insert clock() calls in between operations to get an idea of the execution time.
You say that when you break, you find yourself inside vector::add. You don't have a vector::add in the code you've shown us, so I suspect it's inside the foo
function. Without seeing that code, it's going to be difficult to say what's up.
You might have inadvertently created a Shlemiel the Painter algorithm.
You can improve things somewhat if you ditch your map and partition your vector instead. This implies reordering the input files list. It also means you have to find a way of quickly determining if a file has been processed already, possibly by holding a flag in the File
class. If it's ok to reorder the files list and if you can store that dirty flag in the File
object then you can improve performance from O(n log m) to O(n), for n total files and m processed files.
#include <algorithm>
#include <functional>
// ...
vector<File*>::iterator end(partition(inputfiles.begin(), inputfiles.end(),
not1(mem_fun(&File::is_processed))));
for_each(inputfiles.begin(), end, processFile);
If you can't reorder the files list or if you can't change the File
object then you can switch the map
with a vector
and shadow each file in the input files list with a flag in the second vector at the same index. This will cost you O(n) space but will give you O(1) check for dirty state.
vector<File*> processed(inputfiles.size(), 0);
for( vector<File*>::size_type i(0); i != inputfiles.size(); ++i ) {
if( processed[i] != 0 ) return; // O(1)
// ...
processed[i] = inputfiles[i]; // O(1)
}
But be careful: You're dealing with two distinct pointers pointing at the same address, and that's the case for each pair of pointers in the two containers. Make sure one and only one pointer owns the pointee.
I don't expect either of these to yield a solution for that performance hit, but nevertheless.
If you are doing most of your work in linux then I strongly strongly suggest you only ever compile to release mode in windows. That makes life much easier, especially considering all the windows inflexible library handling headaches.
精彩评论