Why is my C++ app faster than my C app (using the same library) on a Core i7
I have a library written in C and I have 2 applications written in C++ and C. This library is a communication library, so one of the API calls looks like this:
int source_send( source_t* source, const char* data );
In the C app the code does something like this:
source_t* source = source_create();
for( int i = 0; i < count; ++i )
source_send( source, "test" );
Where as the C++ app does this:
struct Source
{
Source()
{
_source = source_create();
}
bool se开发者_如何学Gond( const std::string& data )
{
source_send( _source, data.c_str() );
}
source_t* _source;
};
int main()
{
Source* source = new Source();
for( int i = 0; i < count; ++i )
source->send( "test" );
}
On a Intel Core i7 the C++ code produces almost exactly 50% more messages per second.. Whereas on a Intel Core 2 Duo it produces almost exactly the same amount of messages per second. ( The core i7 has 4 cores with 2 processing threads each )
I am curious what kind of magic the hardware performs to pull this off. I have some theories but I thought I would get a real answer :)
Edit: Additional information from comments
Compiler is visual C++, so this is a windows box (both of them)
The implementation of the communication library creates a new thread to send messages on. The source_create is what creates this thread.
From examining your source code alone, I can't see any reason why the C++ code should be faster.
The next thing I would do is check out the assembly code that is being generated. If you are using a GNU toolchain, you have a couple of ways to do that.
You can ask gcc and g++ to output the assembly code via the -S
command line argument. Make sure that other then adding that argument, you use the exact same command line arguments that you do for a regular compile.
A second option is to load your program with gdb and use the disas
command.
Good luck.
Update
You can do the same things with the Microsoft Toolchain.
To get the compiler to output assembly, you can use either /FA or /FAs. The first should output assembly only while the second will mix assembly and source (which should make it easier to follow).
As for using the debugger, once you have the debugger started in Visual Studio, navigate to "Debug | Windows | Disassembly" (verified on Visual Studio 2005, other versions may vary).
Without seeing the full code or the assembly my best guess is that the c++ compiler is inlining for you. One of the beauties of c++ compilers is the ability to inline just about anything for speed, and microsoft's compilers are well known to gratuitously inline almost to the point of unreasonably bloating end executables.
The first thing I would recommend doing is profile both versions and see if there's any noticable differences.
Is the C version copying something unnecessarily (it could be a subtle or not so subtle optimization like the return value optimization).
This should show up in a good profiler, if you have a higher end VS SKU the sampling based profiler is there good, if you're looking for a good free profiler the Windows Performance Analyzer is incredibly powerful for Vista and up here's a walkthrough on using the stackwalking option
The first thing I would probably do myself is break into the debugger and inspect the disassembly for each to see if they are noticably different. Note there is a compiler option to spit out the asm to a text file.
I would follow this up with a profile if there wasn't something glaringly obvious (like an extra copy).
One more thing, if you're worried about the hyper threads getting in the way, hard affinitize the process to a non HT core. You can do this either via task-manager in the GUI or via SetThreadAffinityMask.
-Rick
Core i7's are hyper-threaded - do you have HT enabled?
Maybe the C++ code is somehow compiled to take advantage of the HT whereas the C code does not. What does task manager look like when you're running your code? Evenly spread load across how many cores, or a few cores maxed out?
Just a wild guess: If you're compiling the library source along with your application, and the C API functions aren't declared extern "C", then maybe the C++ version is using a different and somehow faster calling convention??
Also, if you're compiling the library source along with your application, then maybe the C++ compiler is compiling your library source as C++, and is somehow better at optimizing than your C compiler?
精彩评论