how to tackle this combinatorial algorithm problem
I have N
people who must each take T
exams. Each exam takes "some" time, e.g. 30 min (no such thing as finishing early). Exams must be performed in front of an examiner.
I need to schedule each person to take each exam in front of an examiner within an overall time period but avoiding a lunch break, using the minimum number of examiners for the minimum amount of time (i.e. no/minimum examiners idle)
There are the following restrictions:
- No person can be in 2 places at once
- each person must take each exam once
- noone should be examined by the same examiner twice
I realise that an optimal solution is probably NP-Complete, and that I'm probably best off using a genetic algorithm to obtain a best estimate (similar to this? Seating plan software recommendations (does such a beast even exist?)).
I'm comfortable with how genetic algorithms work, what i'm struggling with is how to model the problem programatically such that i CAN manipulate the parameters genetically..
If each exam took the same amount of time, then i'd divide the time period up into these lengths, and simply create a matrix of time slots vs examiners and drop the candidates in. However because the times of each test are not necessarily the same, i'm a bit lost on how to approach this.
currently im doing this:
- make a list of all "tests" 开发者_C百科which need to take place, between every candidate and exam
- start with as many examiners as there are tests
- repeatedly loop over all examiners, for each one: find an unscheduled test which is eligible for the examiner (based on the restrictions)
- continue until all tests that can be scheduled, are
- if there are any unscheduled tests, increment the number of examiners and start again.
i'm looking for better suggestions on how to approach this, as it feels rather crude currently.
As julienaubert proposed, a solution (which I will call schedule) is a sequence of tuples (date, student, examiner, test) that covers all relevant student-test combinations (do all N students take all T tests?). I understand that a single examiner may test several students at once. Otherwise, a huge number of examiners will be required (at least one per student), which I really doubt.
Two tuples A and B conflict if
- the student is the same, the test is different, and the time-period overlaps
- the examiner is the same, the test is different, and the time-period overlaps
- the student has already worked with the examiner on another test
Notice that tuple conflicts are different from schedule conflicts (which must additionally check for the repeated examiner problem).
Lower bounds:
- the number E of examiners must be >= the total number of the tests of the most overworked student
- total time must be greater than the total length of the tests of the most overworked student.
A simple, greedy schedule can be constructed by the following method:
- Take the most overworked student and assigning tests in random order, each with a different examiner. Some bin-packing can be used to reorder the tests so that the lunch hour is kept free. This will be a happy student: he will finish in the minimum possible time.
- For each other student, if the student must take any test already scheduled, share time, place and examiner with the previously-scheduled test.
- Take the most overworked student (as in: highest number of unscheduled tests), and assign tuples so that no constraints are violated, adding more time and examiners if necessary
- If any students have unscheduled tests, goto 2.
Improving the choices made in step 2 above is critical to improvement; this choice can form the basis for heuristic search. The above algorithm tries to minimize the number of examiners required, at the expense of student time (students may end up with one exam early on and another last thing in the day, nothing in between). However, it is guaranteed to yield legal solutions. Running it with different students can be used to generate "starting" solutions to a GA that sticks to legal answers.
In general, I believe there is no "perfect answer" to a problem like this one, because there are so many factors that must be taken into account: students would like to minimize their total time spent examining themselves, as would examiners; we would like to minimize the number of examiners, but there are also practical limitations to how many students we can stack in a room with a single examiner. Also, we would like to make the scheduling "fair", so nobody is clearly getting much worse than others. So the best you can hope to do is to allow these knobs to be fiddled with, the results to be known (total time, per-student happiness, per-examiner happiness, exam sizes, perceived fairness) and allow the user to explore the parameter space and make an informed choice.
I'd recommend using a SAT solver for this. Although the problem is probably NP-hard, good SAT solvers can often handle hundreds of thousands of variables. Check out Chaff or MiniSat for two examples.
Don't limit yourself to genetic algorithms prematurely, there are many other approaches.
To be more specific, genetic algorithms are only really useful if you can combine parts of two solutions into a new one. This looks rather hard for this problem, at least if there are a similar number of people and exams so that most of them interact directly.
Here is a take on how you could model it with GA.
Using your notation:
- N (nr exam-takers)
- T (nr exams)
Let the gene of an individual express a complete schedule of bookings. i.e. an individual is a list of specific bookings: (i,j,t,d)
- i is the i'th exam-taker [1,N]
- j is the j'th examiner [1,?]
- t is the t'th test the exam-taker must take [1,T]
- d is the start of the exam (date+time)
evaluate using a fitness function which has the property to:
- penalize (severly) for all double booked examiners
- penalize for examiners idle-time
- penalize for exam-takers who were not allocated within their time period
- reward for each exam-taker's test which was booked within period
this function will have all the logic to determine double bookings etc.. you have the complete proposed schedule in the individual, you now run the logic knowing the time for each test at each booking to determine the fitness and you increase/decrease the score of the booking accordingly.
to make this work well, consider:
- initiation - use as much info you have to make "sane" bookings if it is computationally cheap.
- define proper GA operators
initiating in a sane way:
- random select d within the time period
- randomly permute (1,2,..,N) and then pick the i from this (avoids duplicates), same for j and t
proper GA operators:
say you have booking a and b: (a_i,a_j,a_t,a_d) (b_i,b_j,b_t,b_d)
you can swap a_i and b_i and you can swap a_j and b_j and a_d and b_d, but likely no point in swapping a_t and b_t.
you can also have cycling, best illustrated with an example, if N*T = 4 a complete booking is 4 tuples and you would then cycle along i or j or d, for example cycle along i:
a_i = b_i
b_i = c_i
c_i = d_i
d_i = a_i
You might also consider constraint programming. Check out Prolog or, for a more modern expression of logic programming, PyKE
精彩评论