MPI error due to Timeout in making connection to a remote process
I'm trying to run a NAS-UPC benchmark to study it's profile. UPC uses MPI to communicate with remote processes .
When I run the benchmark with 64 processes , i get the following error
upcrun -n 64 bt.C.64
"Timeout in making connection to remote process on <<machine name>>"
Can anybody tell me why this e开发者_Python百科rror occurs ?
this probably means that you're failing to spawn the remote processes - upcrun delegates that to a per-conduit mechanism, which may involve your scheduler (if any). my guess is that you're depending on ssh-type remote access, and that's failing, probably because you don't have keys, agent or host-based trust set up. can you ssh to your remote nodes without password? sane environment on the remote nodes (paths, etc)?
"upcrun -v" may illuminate the problem, even without resorting to the man page ;)
精彩评论