Drastic CPU drop with C program
My program is experiencing a nasty drop in performance. It is basically a pair of nested for loops which do an operation of a pair of data sets and then writes the result. The problem is, after about 500 of the 300,000 pairs it slows from taking .07 seconds/pair to 5 seconds/pair and CPU usage drops from nearly 100% to ~4%. All memory used throughout is allocated before the nested loops and freed after the loops.
Here's pseudo code so you can hopefully get the idea:
for (i=0; i<759; i++) {
read_binary_data(data_file_1, data_1);
read_binary_header(header_file_1, header_1);
for (j=i+1; j<760;j++) {
read_binary_data(data_file_2, data_2);
read_binary_header(header_file_2, header_2);
do_operation(data_1, data_2, out_data);
update_header_data(header_1, header_2, out_header);
write_binary_data_and_header(out_data, out_header);
}开发者_Go百科
}
I've put in timing flags at the beginning and end of the second for loop to see the timing quoted above, but I was wondering if there might be better debugging options to show me why the operation is slowing down. The only thoughts so far I've had is file system blocking, but I only open 5-6 files on each run and each is closed at the end of its subroutine.
Update at 10:15 P.M. Pacific time:
After various tests, I've found the culprit seems to be in the read_binary_data portion. It can take over 3 seconds for many files. I'm going to attempt to pack all of the binary data into 1 file and read it all at once so I only need the one read. I'm betting I'll run out of memory, but its worth a shot and if that happens, I'll just be less ambitious and try to do less than 760 * 2 * 31 * 43201 floats in an array at a time (I guess that should be around 16 GB?).Are you freeing the buffers that you are holding the data in? It sounds like you have exhausted memory and switched to swap after 500 files. What is your memory usage like?
Perhaps your writing to file is done inefficiently and as you progress you need to do more and more seeks?
Perhaps comment out the two lines that write to disk and see if you get a consistent run.
Otherwise, it could be your reads. It's hard to see how you have actually done the file operations, but it's easy to do it in a really expensive way.
Either way, if your CPU is low and your memory is low, you are left with blocking I/O operations!
The first things that spring to mind, despite your claim that memory is not being allocated inside the loop, are
- Memory leak
- Memory fragmentation
- Cache saturation
Without more details on what's actually going on, like the environment you're running in or what other functions your functions are calling, its really impossible to speculate more. The problem is just too abstract.
First to your actual question - "C" has no debugging options to do with I/O performance or any other kind of performance. Your IDE, debugger, or OS might, although I'm afraid I don't know the details of any.
Stupid question - do all the loops produce the same amount of output? Maybe the first 500 are small.
It could be that 500 loops is how long it takes to fill the disk write cache (at one or more levels - process, OS, hardware), and after that the program is I/O bound. Can't really say whether that's likely, without knowing the amounts of data involved.
Try writing 1GB of data to a file, and time it, to get a very rough idea of what sustained rate is plausible. If 0.07 seconds per pair, times the amount of data per pair, works out faster than that rate, then your initial fast rate is a one-time-only special offer: the disk will have to catch up sooner or later.
Beyond that, think more about what your output is actually doing, that you don't detail. Writing in a straight line? Seeking back and forth? Inserting records into an ordered array on disk, so that each write has to move on average 50% of the data written so far? Different access patterns obviously have very different expected performance over time.
I focus on output rather than input, on the assumption that the read cache is useless, so that your read speeds will be fairly consistent throughout. That's not necessarily the case, but if the computer can't predict your access patterns then it's a pretty good approximation.
Even so, 300000 * 5 seconds is over 400 hours. This is enough time for any mere mortal computer to write its entire hard disk many times over. So you'd have to be doing something quite odd for raw write speed to be all there is to it.
Unless you allocate so much memory that the system starts swapping, you're I/O bound.
You are doing a linear searching kind of thing. Are your data is stored in a file??
If it is, then you can read all the data at a time only and then store it in a Binary Search Tree. It will reduce the time complexity of your program.
精彩评论