What is the reliable way to return error code from an MPI program?
The MPI standard (page 295) says:
Advice to users. Whether the errorcode开发者_高级运维 is returned from the executable or from the MPI process startup mechanism (e.g., mpiexec), is an aspect of quality of the MPI library but not mandatory.
Indeed I had no success in running the following code:
if(0 == my_rank)
{
FILE* parameters = fopen("parameters.txt", "r");
if(NULL == parameters)
{
fprintf(stderr, "Could not open parameters.txt file.\n");
printf("Could not open parameters.txt file.\n");
exit(EXIT_FAILURE); //Tried MPI_Abort() as well
}
fscanf(parameters, "%i %f %f %f", N, X_DIMENSION_Dp, Y_DIMENSION_Dp, HEIGHT_DIMENSION_Dp);
fclose(parameters);
}
I am not able to get the error code back into the shell in order to make a decision on further actions. Neither of two error messages are printed. I think I might write the error codes and messages to a dedicated file.
Has anyone ever had a similar problem and what were the options you've considered to do a reliable error reporting?
EDIT:
The problem was not caused by the MPI. What really was wrong is the way I treated error codes that the scheduler returned. I use system with LoadLeveler installed. First I do$ llsubmit my_job_file.sh
then upon completion of the job I recive the email with the status of the job and it's return error code. In my case the error code was always zero even if my MPI programm has exited using MPI_Abort function. Then I realized that the error code returned was that of the script my_job_file.sh itself, but not the MPI program that is run within the script. my_job_file.sh looked like that:
# @ different LoadLeveler options ...
poe ./my_mpi_program > my_mpi_program.output
Then I've modified it to be
# @ different LoadLeveler options ...
poe ./my_mpi_program > my_mpi_program.output
exit $?
and then I finaly got the error code I wanted.
MPI_Abort should work.
int MPI_Abort( MPI_Comm comm, int errorcode )
精彩评论