Perl: How to add an interrupt handler so one can control a code executed by mpirun via system()?
We use a cluster with Perceus (warewulf) software to do some computing. This software package has wwmpirun
program (a Perl script) to prepare a hostfile and execute mpirun
:
# ...
system("$mpirun -hostfile $t开发者_运维问答mp_hostfile -np $mpirun_np @ARGV");
# ...
We use this script to run a math program (CODE) on several nodes, and CODE is normally supposed to be stopped by Ctrl+C giving a short menu with options: status, stop, and halt. However, running with MPI, pressing Ctrl+C badly kills CODE with loss of data.
Developers of CODE suggest a workaround - the program can be stopped by creating a file with name stop%s
, where %s
is name of task-file being executed by CODE. This allows to stop, but we cannot get status of calculation. Sometimes it takes really long time and getting this function back would be very appreciated.
What do you think - the problem is in CODE or mpirun
?
Can one find a way to communicate with CODE executed by mpirun
?
UPDATE1
In single run, one gets status of calculation by pressing Ctrl+C and choosing option status
in the provided menu by entering s
. CODE prints status information in STDOUT and continues to do the calculation.
"we cannot get status of calculation" - what does that mean? do you expect to get the status somehow but are not? or is the software not designed to give you status?
Your
system
call doesn't re-direct standard error/out anyplace, is that where the status is supposed to be (in which case, catch it by opening a pipe or re-directing to a log and having the wrapper read the log).Also, you're not processing the return code by evaluating the return value of
system
call - that may be another way the program communicates.Your Ctrl+C problem might be because Ctrl+C is caught by the Perl wrapper which dies instead of by the CODE which has some nice Ctrl+C interrupt handler. The solution might be to add interrupt handler to
mpirun
- see Perl Cookbook Recipe 16.18 for$SIG{INT}
or http://www.wellho.net/resources/ex.php4?item=p216/sigint ; you may want to have the Perl wrapper catchCtrl+C
and send the INT signal to CODE it launched.
精彩评论