fopen() hangs. Sometimes
I am running on Debian Etch: Linux nereus 2.6.18-6-686 #1 SMP Sat Dec 27 09:31:05 UTC 2008 i686 GNU/Linux
I have a multi threaded c application, and one thread is hanging. Sometimes. Through core files, I have figured out that it is hanging on a fopen():
#0 0xb7f4b410 in ?? ()
#1 0xb660521c in ?? ()
#2 0x000001b6 in ?? ()
#3 0x00008241 in ?? ()
#4 0xb77c45bb in open () from /lib/tls/i686/cmov/libc.so.6
#5 0xb7768142 in _IO_file_open () from /lib/tls/i686/cmov/libc.so.6
#6 0xb77682e8 in _IO_file_fopen () from /lib/tls/i686/cmov/libc.so.6
#7 0xb775d8c9 in fgets () from /lib/tls/i686/cmov/libc.so.6
#8 0xb775fe0a in fopen64 () from /lib/tls/i686/cmov/libc.so.6
#9 0x0805600f in comric_write_external_track_file (control=0xbfc9c284) at ../COMRIC/comric_thread.c:784
#10 0x08055b0e in store_tracks (control=0xbfc9c284, hdr=0xb3d1b828) at开发者_运维问答 ../COMRIC/comric_thread.c:695
#11 0x080568be in comric_thread (userdata=0xbfc9c284) at ../COMRIC/comric_thread.c:997
#12 0xb789530f in g_thread_create_full () from /usr/lib/libglib-2.0.so.0
#13 0xb783f240 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#14 0xb77d349e in clone () from /lib/tls/i686/cmov/libc.so.6
This thread gets data from an external source, processes it, and writes it to a text file. The text file is being written over and over and over again, as we get new data. No one else is accessing this file. The file size is typically less than 1KB. I am checking the fclose() call to make sure it is returning success, and it is.
When the main thread detects that we haven't heard from the problem thread in more than 30 seconds, it calls abort() so we can get the core dump you see above.
99% of the time, everything runs smoothly. But in the last four days, this is been happening more and more (6+ times a day). I worried that it might be a hard drive problem, but I cannot find any errors reported in any of the logs. (Unfortunately, SMART information is not available.) This application has been running smoothly for 2 years.
Anyone have any thoughts?
Source code:
int comric_write_external_track_file( struct ComricControl *control ) {
FILE *file;
if( strlen( control->extern_track_file ) == 0 ) return 1;
file = fopen( control->extern_track_file, "w" );
if( !file ) {
ps_slog( "ERROR opening external track file: \"%s\"", control->extern_track_file );
return 0;
}
// Write the file
G_MUTEX_LOCK( control->mutex );
g_hash_table_foreach( control->tracks, comric_write_track, file );
G_MUTEX_UNLOCK( control->mutex );
fsync( fileno( file ));
if( fclose( file ) != 0 ) {
ps_slog( "FATAL ERROR - fclose() FAILED with error \"%s\" (%d)", strerror( errno ), errno );
sleep( 1 ); abort(); // can we get any debug info out of this?
}
return 1;
}
I added the fsync() call after doing some hunting on the net. At first, I thought this might be related to the fclose() failing, but it doesn't seem to be the case.
You're hanging in open()
- so this is likely a problem at the kernel or driver level.
First, check dmesg
for obvious error messages. If this fails, you can try useing the SysRq w command to get a stacktrace of the offending process.
精彩评论