开发者

Improving a thread scheduling strategy

I need help in enhancing a thread scheduling strategy I am working on.

Background

To set the context, I have a couple (20-30 thousand) "tasks" that needs to be executed. Each task can execute independently. In reality, the range of execution time varies between 40ms and 5mins across tasks. Also, each individual task when re-run takes the same amount of time.

I need options to control the way these tasks are executed, so I have come up with a scheduling engine that schedules these tasks based on various strategies. The most basic strategy is FCFS, i.e. my tasks get executed sequentially, one by one. The second one is a batch strategy, the scheduler has a bucket size "b" which controls how many threads can run in parallel. The scheduler will kick off non-blocking threads for the frist "b" tasks it gets, then waits for those started tasks to complete, then proceed with the next "b" tasks, starting them in parallel and then waiting for completion. Each "b" set of tasks processed at a time is termed a batch and hence batch scheduling.

Now, with batch scheduling, activity begins to increase at the beginning of the batch, when threads start getting created, then peaks in the middle, when most of the threads would be running, and then wanes down as we block and wait for the threads to join back in. Batch scheduling becomes FCFS scheduling when batch size "b" = 1.

One way to improve on batch scheduling is what I will term as parallel scheduling - the scheduler will ensure, if sufficient number of tasks are present, that "b" number of threads will keep running at any point in time. The number of threads initially will ramp up to "b", then maintain the count at "b" running threads, until the last set of tasks finish execution. To maintain execution of "b" threads at any time, we need to start a new thread the moment an old thread finishes execution. This approach can reduce the amount of time taken to finish processing all the tasks compared to batch scheduling (average case scenario).

Part where I need help

The logic I have to implement parallel scheduling follows. I would be obliged if anyone can help me on:

  • Can we avoid the use of the startedTasks list? I am using that because I need to be sure that when the Commit() exits, all tasks have completed execution, so I just loop through all startedTasks and block until they are complete. One current problem is that list will be long.

--OR--

  • Is there a better way to do parallel scheduling?

(Any other suggestions/strategies are also welcome - main goal here is to shorten overall execution duration within the constraints of the batch size "b")

ParallelScheduler pseudocode

// assume all variable access/updates are thread safe
Semaphore S: with an initial capacity of "b"
Queue<Task> tasks
List<Task> startedTasks
bool allTasksCompleted = false;

// The following method is called by a callee
// that wishes to start tasks, it can be called any number of times
// passing various task items
METHOD void ScheduleTask( Task t )

    if the PollerThread not started yet then start it 
    // starting PollerThead will call PollerThread_Action


    // set up the task so that when it is completed, it releases 1
    // on semaphore S
    // assume OnCompleted is executed when the task t completes
    // execution after a call to t.Start()
    t.OnCompleted() ==> S.Release(1)

    tasks.Enqueue ( t )


// This method is called when the callee
// wishes to notify that no more tasks are present that needs
// a ScheduleTask call.
METHOD void Commit()
    // assume that the following assignment is thread safe
    stopPolling = true;

    // assume that the following check is done efficiently
    wait until allTasksCompleted is set to true


// this is the method the poller thread once started will execute
METHOD void PollerThread_Action
    while ( !stopPolling )
        if ( tasks.Count > 0 )
            Task nextTask = tasks.Deque()
            // wait on the sem开发者_如何学JAVAaphore to relase one unit
            if ( S.WaitOne() )              
                // start the task in a new thread
                nextTask.Start()
                startedTasks.Add( nextTask )
    // we have been asked to start polling
    // this means no more tasks are going to be added
    // to the queue
    // finish off the remaining tasks
    while ( tasks.Count > 0 )
        Task nextTask = tasks.Dequeue()
        if ( S.WaitOne() )
            nextTask.Start()
            startedTasks.Add ( nextTask )

    // at this point, there are no more tasks in the queue
    // each task would have already been started at some
    // point
    for every Task t in startedTasks
        t.WaitUntilComplete() // this will block if a task is running, else exit immediately

    // now all tasks are complete
    allTasksCompleted = true


Search for 'work stealing scheduler' - it is one of the most efficient generic schedulers. There are also several open source and commercial implementations around.

The idea is to have fixed number of worker threads, that take tasks from a queue. But to avoid the congestion on a single queue shared by all the threads (very bad performance problems for multi-CPU systems) - each thread has its own queue. When a thread creates new tasks - it places them to its own queue. After finishing tasks, thread gets next task from its own queue. But if the thread's queue is empty, it "steals" work from some other thread's queue.


When your program knows a task needs to be run, place it in a queue data structure.

When your program starts up, also start up as many worker threads as you will need. Arrange for each thread to do a blocking read from the queue when it needs something to do. So, when the queue is empty or nearly so, most of your threads will be blocked waiting for something to go into the queue.

When the queue has plenty of tasks in it, each thread will pull one task from the queue and carry it out. When it is done, it will pull another task and do that one. Of course this means that tasks will be completed in a different order than they were started. Presumably that is acceptable.

This is far superior to a strategy where you have to wait for all threads to finish their tasks before any one can get another task. If long-running tasks are relatively rare in your system, you may find that you don't have to do much more optimization. If long-running tasks are common, you may want to have separate queues and separate threads for short- and long- running tasks, so the short-running tasks don't get starved out by the long-running ones.

There is a hazard here: if some of your tasks are VERY long-running (that is, they never finish due to bugs) you'll eventually poison all your threads and your system will stop working.


You want to use a space-filling-curve to subdivide the tasks. A sfc reduce a 2d complexity to a 1d complexity.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜