开发者

"p4_error : child process exited error"while running 256 threads of NAS benchmark on 32 node cluster

I'm trying to get a UPC-NAS Benchmark (compiled for 256 threads) running on a cluster of 32 nodes. When I run it, the rsh connections are established for 247 threads and it terminates giving an error as follows

p0_11350:  p4_error: Child process exited while making con开发者_如何学Cnection to remote process on dell16: 0
506 rm_l_237_24446: (26.785156) net_send: corm_11947: (215.339844) net_srm_l_1rm_24412: (26.785156) net_send: could not write to fd=4, errnrrrm_l_127_5013: (121.984375) net_send: could not w    rite to fd=5, errno = 32

Can anybody point out where the problem lies ?

It runs fine for lesser threads like 64, 128 etc.


Errno 32 is EPIPE (#define EPIPE 32 /* Broken pipe */).

I suggest, that some file descriptor limit is hitted (check ulimit -a). Or network limits. Or network failure.

Also I should mention, that p4 is anciently old. It can be some internal limit. The development of p4 stopped > 15 years ago. It is kind of very stable code in terms of inclusion into Debian Stable.

So, why do you use mpich1? Can you move to less ancient mpich2?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜