Azure Worker Role Design
I am trying to design an azure worker role routine. A worker role polls a job queue. For each job, required number of threads is specified in the job message. A job is running an instance of an executable. Example: name of the executable is Rax.exe. Rax.exe can run on different number of threads. If we call it as Rax.exe -T 2 it will create two threads. So we do not have to deal with how to create threads. We just call Rax.exe with the appropriate commandline argument. I have Extra-开发者_运维百科Large worker instances. So, I can run 8 threads simultaneously. I want to utilize the workers as much as I can. We may have many jobs, each having different number of threads specified.
Example:
Job Queue:
1 Rax.exe -T 3
2 Rax.exe -T 5
3 Rax.exe -T 1
4 Rax.exe -T 8
5 Rax.exe -T 4
In this example, we have 5 jobs. A worker reads the first message and starts the job. This job consumes 3 threads. A worker can have 8 threads so the remaining 5 threads can be utilized by running another job from the queue.
Currently, I do not know how to run multiple processes inside of a worker role. I am using waitForExit method of the process class. Each running instance of the executable creates output files so I have to collect those generated files.
My Questions:
1- How can I start multiple processes asynchronously and be notified when they exit? I have to do this while still polling the job queue. 2- Is this kind of job scheduling a hard problem? Can anyone come up with a good heuristic?
EDIT: I think, estimating required running time for each job will be helpful. This kind of information exist. With this information, can it be solved?
1- How can I start multiple processes asynchronously and be notified when they exit? I have to do this while still polling the job queue.
This ones quite simple - instead of using WaitForExit
, you can subscribe to the Exited
event
2- Is this kind of job scheduling a hard problem? Can anyone come up with a good heuristic?
As Erno has suggested in his comment, one good way to solve this problem is to pass the problem off to the Parallel Task API. While a general multi-thread scheduling algorithm might not provide the "most optimal" scheduling solution, it can provide a really good solution for very little effort - and with the complexity of work, then sometimes the general scheduling algorithm can outperform a hand-crafted solution..
If you are interested in scheduling approaches for batch processes on Azure, then it might be worth looking at some of the map-reduce type projects on Azure:
- lokad.cloud - http://code.google.com/p/lokad-cloud/
- hadoop in Azure - http://blogs.msdn.com/b/mariok/archive/2011/05/11/hadoop-in-azure.aspx
- Dryad - http://research.microsoft.com/en-us/projects/DryadLINQ/
- Grid - http://azuregrid.codeplex.com/
While these approaches are mainly about distributing work across multiple machines, the same kind of approach can apply to distributing work across multiple cores within the same machine.
You should use Multiple Worker Role instances.
This is how Multi-Processing is done in the Azure Platform/paradigm. You can have more than one Role Instance grabbing items off of the same Queue, which is how the system has been designed.
精彩评论