design pattern asking for advice: push model v.s. pull model
My application has several workers (working on different things as different processes) and some resources (working unit). Different workers need to process on all working unites. For example, I have workers like W1, W2 and W3, working unit U1 and U2. Then W1 needs to process U1 and U2, the same as W2 and W3. The restriction is different workers can not work on the same work unit at the same time.
I have two designs and want to ask for advice which o开发者_StackOverflowne is better.
- Push model: using a central job scheduler to assign work units to different workers, to ensure different workers are not working on the same work unit;
- Pull model: each worker will ask a central job scheduler for work units to process, and job scheduler will select an appropriate work unit which is not being processed by other worker for the asking worker.
I want to know the pros and cons of each design. And one of my major concerns is about -- finding a loosely coupled design (it is one of my major goal, but not the only goal). I am not sure whether push model or poll model has better extensibility (option 1 is more loosely coupled)?
thanks in advance, George
The advantage of the "pull" model is that each worker knows locally how much it is loaded and thus can manage its load.
Also, the "pull" model might be more "decoupled" as the variable of "load" is kept local to the worker whereas in the "push" model one would need a communication protocol (and overhead) to communicate this state.
Think of the success of the "pull" model in the auto industry: it went from the traditional "push" model where inventories would be difficult to track and required lots of feedback to the now successful and ubiquitous "pull" model.
When it comes to scaling, you can have an intermediate layer of "schedulers" that "poll" for jobs from the layer above. The base workers can now interact with the intermediate layer in a partitioned way.
Note that in either model a coordination communication protocol is required: it is the nature of the coordination protocol that differs. In the "push model", there is an additional control loop required to report/poll the "load factor" of each worker. As one scale the system, more bandwidth is required, more state on the scheduler side, more latency incurred etc.
I would certainly use the Pull model as it is simpler to implement.
I can only imagine 2 implementations:
Pull model = 1 service with the tasks collection plus many worker clients.
Push model = 1 service with the tasks collection and a list of active subscribers plus many active subscribers (workers).
As the Pull model doesn't have to implement full duplex service calls neither a subscriber list, it is simpler.
精彩评论