How can I debug a Perl program that suddenly exits?
I have Perl program based on IO::Async
, and it sometimes just exits after a few hours/days without printing any error message whatsoever. There's nothing开发者_开发百科 in dmesg
or /var/log
either. STDOUT
/STDERR
are both autoflush(1)
so data shouldn't be lost in buffers. It doesn't actually exit from IO::Async::Loop->loop_forever
- print I put there just to make sure of that never gets triggered.
Now one way would be to keep peppering the program with more and more prints and hope one of them gives me some clue. Is there better way to get information what was going on in a program that made it exit/silently crash?
One trick I've used is to run the program under strace
or ltrace
(or attach to the process using strace
). Naturally that was under Linux. Under other operating systems you'd use ktrace
or dtrace
or whatever is appropriate.
A trick I've used for programs which only exhibit sparse issues over days or week and then only over handfuls among hundreds of systems is to direct the output from my tracer to a FIFO, and have a custom program keep only 10K lines in a ring buffer (and with a handler on SIGPIPE and SIGHUP to dump the current buffer contents into a file. (It's a simple program, but I don't have a copy handy and I'm not going to re-write it tonight; my copy was written for internal use and is owned by a former employer).
The ring buffer allows the program to run indefinitely with fear of running systems out of disk space ... we usually only need a few hundred, even a couple thousand lines of the trace in such matters.
If you are capturing STDERR
, you could start the program as perl -MCarp::Always foo_prog
. Carp::Always
forces a stack trace on all errors.
A sudden exit without any error message is possibly a SIGPIPE
. Traditionally SIGPIPE
is used to stop things like the cat
command in the following pipeline:
cat file | head -10
It doesn't usually result in anything being printed either by libc
or perl
to indicate what happened.
Since in an IO::Async
-based program you'd not want to silently exit on SIGPIPE
, my suggestion would be to put somewhere in the main file of the program a line something like
$SIG{PIPE} = sub { die "Aborting on SIGPIPE\n" };
which will at least alert you to this fact. If instead you use Carp::croak
without the \n
you might even be lucky enough to get the file/line number of the syswrite
, etc... that caused the SIGPIPE
.
精彩评论