What's the best advance of Hydra MPI
I'm studying about the new Process Manager that came automatically with MPICH2, but until now I can't figure out what's is big advance of this implementation, someone have knows a good tutorial or have some experience with?
The argonne wiki is a kind of too simple: http开发者_如何学运维://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_ManagerFrom the point of view of where I work, the biggest single advance is scalability of process launching. Launching 8000+ task jobs with the previous process launchers in MPICH2-based MPI implementations was unusably slow and would frequently fail due to timeouts or other network problems, which all but ruled out MPICH2-based MPIs for our largest jobs. But Hydra has a good hierarchical launch model which can also take advantage of your resource manager.
The topology-aware allocation strategies are good, too, but compared to the difference between jobs startup failing (or taking hours) and jobs succeeding, it's a second-order effect.
I completely agree with Jonathan about the substantial improvement in job startup times. In addition though, hydra is generally much more useful and more robust than previous process managers in nearly every aspect. It launches more reliably, has more features (process-core binding, format-based output file redirection, resource manager and batch scheduler integration, etc.), and has less cryptic error messages than all previous process managers for MPICH2.
Another key consideration is that hydra is actively maintained, while nearly all other PMs are deprecated and/or unsupported at this point. So if you report a bug in hydra it will likely get fixed, which is not true for MPD or remshell.
AFAIK, you can also use hydra to launch non-MPI jobs, such as UPC programs, if you know what you are doing.
精彩评论