Experiences with message based master-worker frameworks (Java/Python/.Net)
I am designing a distributed master-worker system which, from 10,000 feet, consists of:
- Web-based UI
- a master component, responsible for generating jobs according to a configurable set of algorithms
- a set of wor开发者_Go百科kers running on regular pc's, a HPC cluster, or even cloud
- a digital repository
- messaging based middleware
- different categories of tasks, with running times ranging from < 1s to ~6hrs. Tasks are computation heavy, rather than data/IO heavy. The volume of tasks is not expected to be great (as far as I can see now). Probably maxing around 100/min.
Strictly speaking there is no need to move outside of the Windows ecosystem but I would be more comfortable with a cross-platform solution to keep options open (nb. some tasks are Windows only).
I have pretty much settled on RabbitMQ as a messaging layer and Fedora-commons seems to be the most mature off-the-shelf repository. As for the master/worker logic I am evaluating:
- Java-based: Grails + Postgres + DOSGi or GridGain with Zookeeper
- Python-based: Django + Postgres + Celery
- .net-based: ASP.NET MVC + SQL Server + NServiceBus + Sharepoint or Zentity as the repository
I have looked at various IoC/DI containers but doubt they are really the best fit for a task execution container and add extra layers/complexity. But maybe I'm wrong.
Currently I am leaning towards the python solution (keep it lightweight) but I would be interested in any experiences/suggestions people have to share, particularly with the .net stack. Open source/scalability/resilience features are plus points.
PS: A more advanced future requirement will be the ability for the user to connect directly to a running task (using a web UI) and influence its behaviour (real-time steering). A direct communication channel will be needed to do this (doing this over AMQP does not seem like a good idea).
Dirk
With respect to the master / worker logic and the Java option.
Nimble (see http://www.paremus.com/products/products_nimble.html) with its OSGi Remote Services stack might provide an interesting / agile pure OSGi approach. You still have to decided on a specific distribution mechanism. But given that the USe Case is computationally heavy & data-lite, using the Essence RMI transport that ships with Nimble RSA with a simple front end load balancer function might work really well.
An good approach to 'direct communication channel' - would be to leverage DDS - this a low latency Publication / Subscription peer to peer messaging standard - used in distributed command/control type environments. I think there is a bare-bones OSS project somewhere but we (Paremus) work with RTI in this area.
Hope the above is of background interest.
精彩评论