Hadoop Vs. Disco Vs. Condor?
I am trying to find a tool that will manage a bunch of jobs on 100 machines in a cluster (submit the jobs to the machines; make sure that jobs are run etc).
开发者_StackOverflowWhich tool would be more simple to install / manage:
(1) Hadoop?
(2) Disco?
(3) Condor?
Ideally, I am searching for a solution that would be as simple as possible, yet be robust.
Python
integration is also a plus.
I'm unfamiliar with Disco and Condor, but I can answer regarding Hadoop:
Hadoop pros:
- Robust and proven - probably more than anything else out there. Used by many organizations (including the one I work for) to run clusters of 100s of nodes and more.
- Large ecosystem = support + many subprojects to make life easier (e.g. Pig, Hive)
- Python support should be possible through the streaming MR feature, or maybe Jython?
Hadoop cons:
- Neither simple nor elegant (imho). You'll have to spend time learning.
Have you considered the Sun Grid Engine? http://wikis.sun.com/display/GridEngine/Home .
精彩评论