How to use database server for distributed job scheduling?

2022-12-11 23:18 问答作者：

I have around 100 computers and few workers on each of them. The already connect to a central database to query for job parameters.

Now I have to do job schedulin开发者_开发问答g for them. One job for one worker takes few minutes, doesn't require network connection (except for dealing jobs and reporting) and can be done at any time in any order.

Constraints:

no job will be taken/done twice
there will be a timeeout in case the worker dies.

I thought that I could use a separate table in DB to schedule jobs. How to create and access job scheduling table?

Break it down into pieces:

You have a job description - which may have some constraints on where its performed - and you have a queue of jobs to be performed, broadly speaking if there are no other constraints you'd expect the jobs to be done in order i.e. you take them from the front of the queue and add them to the end.

If we run the queue as a single table then jobs will have 3 states:

Not Started
In Progress
Completed

So looking for work to do is simply a matter of finding the first job (or the first n jobs if work is to be assigned in a batch) that are not started and then flagging them as started. If the constraints are more complex than simply the first n available jobs then it becomes the first n available jobs that meet the constraints, but it should still be fairly simple.

That implies the following fields in the queue table:

Status
DateQueued (date and time) for sort
DateStarted (date and time) for timeout
AssignedTo

One should probably add a DateCompleted - but if the work is batched that's not going to be strictly accurate (it'll be time reported as complete).

So for a worker (the worker "app") the process becomes:

Connect to server
Report completed work - set status and completion time
Request new work
1. Search for new work for worker (first n jobs not started that worker can do)
2. Assign new work to worker (set status, Date Started and assigned to) - search and assign as a transaction.
List work and disconnect

Separately you'd need processes to queue work, to look for jobs that have "timed out" so that the status can be reset and to archive or otherwise clear out completed jobs from the queue.

Full table would have the following plus any audit fields required.

ID
JobID -- Assuming that jobs are defined elsewhere
StatusID
DateQueued
DateStarted
AssignedToID
DateCompleted

Hope that helps...

The interesting part, and the part where all of the difficulties lie, is in wrapping things up in transactions.

You'll need two tables: A table of available work, and a table recording work that is in progress. The "work in progress" table has a unique foreign key to the work-available table.

A process wishing to do work first locates a row from the table of work to be done. This should be done using a random sort order, in order to reduce contention.

That process the removes the "work in progress" row. It was never meant to be persist outside the transaction. It's only for locking.

That process then starts a transaction.

That process then creates a row in the "work in progress" table, with a foreign key referencing the work that is being done. It should then do the work. As a part of doing that work, it should change the state of the item being worked on (e.g., making it "finished" and no longer available to be worked on).

The process the commits its transaction.

If some other process has grabbed the work, then this process's transaction will fail due to its attempt to commit a duplicate foreign key to the "work in progress" table. In that case, the process should back off for a short, random interval, and to back to the start, trying to locate some work to do.

Monitor the "work in progress" table carefully. Some databases, or some versions of some database, don't expect a table such work-in-progress table to be used as a queue, with rows constantly being created and deleted. Specifically, older versions of Postgresql had difficulty cleaning up the old, no longer used rows, causing table bloat and poor performance.

继续阅读：database distributed parallel-processing scheduled-tasks

How to use database server for distributed job scheduling?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？