Best strategy for profiling memory usage of my code (open source) and 3rd party code(closed source)
I am soon going to be tasked with doing a proper memory profile of a code that is written in C/C++ and uses CUDA to take advantage of GPU processing.
My initial thoughts would be to create macros and operator overloads that would allow me to track calls to malloc, free, delete, and new calls within my source code. I would just be able to include a different header, and use the __FILE__ and __LINE__
macros to print memory calls to a log file. This type of strategy is found here: http://www.almostinfinite.com/m开发者_运维技巧emtrack.html
What is the best way to track that usage in a linked in 3rd party library? I am assuming I'd pretty much only be able to track memory usage before and after the function calls, correct? In my macro/overload scenario, I can simply track the size of the requests to figure out how much memory is being asked for. How would I be able to tell how much the 3rd party lib is using? It is my understanding also, that tracking "free" doesnt really give you any sense of how much the code is using at any particular time, because it is not necessarily returned to the OS. I appreciate any discussion of the matter.
I dont really want to use any memory profiling tools like Totalview or valgrind, because they typically do a lot of other things (bounds checking, etc) that seems to make the software run very slow. Another reason for this is that I want this to be somewhat thread safe - the software uses MPI I believe to spawn processes. I am going to be trying to profile this in real time so I can dump out to log files or something that can be read by another process to visualize memory usage as the software runs. This is also primarily going to be run in a linux environment.
Thanks
Maybe linker option --wrap=symbol can help you. Really good example can be found here: man ld
Maybe valgrind and the Massif tool?
To track real time memory consumption of my programs on Linux I simply read the /proc/[pid]/stat
. It's a fairly light operation, could be negligible in your case if the 3rd party library your want to track does consequent work. If you want to have memory information during the 3rd party library work, you can read the stat
file into an independent thread or in an other process. (Memory peak rarely append before or after function calls ! ...)
For the CUDA/GPU thing I think gDEBugger could help you. I am not sure but the memory analyzer do not affect performance much.
You could try Google's PerfTools' Heap-Profiler:
http://google-perftools.googlecode.com/svn/trunk/doc/heapprofile.html
It's very lightweight; it literally replaces malloc/calloc/realloc/free to add instrumentation code. It's primarily tested on Linux platforms.
If you have compiled with debugging symbols, and your third-party libraries come with debug-version variants, PerfTools should do very well. If you don't have debug-symbol libraries, build your code with debug symbols anyway. It would give you detailed numbers for your code, and all the leftover can be attributes to the third-party library.
If you don't want to use an "external" tool, you can try to use tools like:
mtrace
It installs handlers for malloc, realloc and free and log every operation to a file. See the Wikipedia I lined for code usage examples.
dmalloc
It's a library you can use in your code, and can find memory leaks, off-by-one errors and usage of invalid addresses. You can also disable it at compile time with -DDMALLOC_DISABLE.
Anyway, I would rather not get this approach. Instead, I suggest you to try and stress test your application while running it on a test server under valgrind (or any equivalent tool) and ensure you're doing memory allocation right, and then let the application run without any memory allocation checking in production to maximize the speed. But, in fact, it depends on what your application do and what your needs are.
You could use the profiler included in Visual Studio 2010 Premium and Ultimate.
It lets you choose between different methods of performance measuring, the most useful for you will probably be CPU sampling because it freezes your program at arbitrary time intervals and figures out which functions it is currently executing, thereby not making your program run substantially slower.
I believe that this question has two very separate answers. One for C/C++ land. And a second for CUDA land.
On the CPU:
I've written my own replacements for new and delete. They were horribly slow and didn't help much. I've used totalview. I like totalview for OpenMP debugging, but I agree very slow for memory debugging. I've never tried valgrind. I've heard similar things.
The only memory debugging tool which I've encountered worth its salt is Intel Parallel Inspector's Memory Checker. Note: As I'm a student, I was able to get an education license on the cheap. That said, it's amazing. It took me twelve minutes to find a memory leak buried in half a million lines of code -- I wasn't releasing a thrown error object which I caught and ignored. I like this one piece of software so much that when my raid failed / Win 7 ate my computer (think autoupdate & raid rebuild simultaneously), I stopped everything and rebuilt the computer because I knew it would take me less time to rebuild the dual boot (48 hours) than it would've to find the memory leak another way. If you don't believe my outlandish claims, download an evaluation version.
On the GPU:
I think you're out of luck. For all memory issues in CUDA, I've essentially had to home grow my own tools and wrappers around cudaMalloc
etc. It isn't pretty. nSight does buy you something, but at this point, not much beyond just a "here's how much you've allocated riiiight now. And on that sad note, almost every performance issue I've had with CUDA was directly dependent on my memory access patterns (that or my thread block size).
精彩评论