"p4_error : child process exited error"while running 256 threads of NAS benchmark on 32 node cluster
I'm trying to get a UPC-NAS Benchmark (compiled for 256 threads) running on a cluster of 32 nodes. When I run it, the rsh connections are established for 247 threads and it terminates giving an error as follows
p0_11350: p4_error: Child process exited while making con开发者_如何学Cnection to remote process on dell16: 0
506 rm_l_237_24446: (26.785156) net_send: corm_11947: (215.339844) net_srm_l_1rm_24412: (26.785156) net_send: could not write to fd=4, errnrrrm_l_127_5013: (121.984375) net_send: could not w rite to fd=5, errno = 32
Can anybody point out where the problem lies ?
It runs fine for lesser threads like 64, 128 etc.
Errno 32 is EPIPE (#define EPIPE 32 /* Broken pipe */).
I suggest, that some file descriptor limit is hitted (check ulimit -a). Or network limits. Or network failure.
Also I should mention, that p4 is anciently old. It can be some internal limit. The development of p4 stopped > 15 years ago. It is kind of very stable code in terms of inclusion into Debian Stable.
So, why do you use mpich1? Can you move to less ancient mpich2?
加载中,请稍侯......
精彩评论