Finding (number of) overlaps in a list of time ranges
Given a list of time ranges, I need to find the maximum number of overlaps.
Following is a dataset showing a 10 minute interval of calls, from which I am trying to find the maximum number of active lines in that interval. ie. from the example below, what is the maximum number of calls that were active at the same time:
CallStart CallEnd
2:22:22 PM 2:22:33 PM
2:22:35 PM 2:22:42 PM
2:22:36 PM 2:22:43 PM
2:22:46 PM 2:22:54 PM
2:22:49 PM 2:27:21 PM
2:22:57 PM 2:23:03 PM
2:23:29 PM 2:23:40 PM
2:24:08 PM 2:24:14 PM
2:27:37 PM 2:39:14 PM
2:27:47 PM 2:27:55 PM
2:29:04 PM 2:29:26 PM
2:29:31 PM 2:29:43 PM
2:29:45 PM 2:30:10 PM
If anyone knows an alogrithm or can point me in the right direction, I would be grateful.
TIA,
Steve F
Following must work:
- Sort all your time values and save Start or End state for each time value.
- Set
numberOfCalls
to 0 (count variable) Run through your time values and:
- increment numberOfCalls if time value marked as Start
- decrement numberOfCalls if time value marked as End
- keep track of maximum value of numberOfCalls during the process (and time values when it occurs)
Complexity: O(n log(n)) for sorting, O(n) to run through all records
Here is a working algorithm in Python
def maximumOverlap(calls):
times = []
for call in calls:
startTime, endTime = call
times.append((startTime, 'start'))
times.append((endTime, 'end'))
times = sorted(times)
count = 0
maxCount = 0
for time in times:
if time[1] == 'start':
count += 1 # increment on arrival/start
else:
count -= 1 # decrement on departure/end
maxCount = max(count, maxCount) # maintain maximum
return maxCount
calls = [
('2:22:22 PM', '2:22:33 PM'),
('2:22:35 PM', '2:22:42 PM'),
('2:22:36 PM', '2:22:43 PM'),
('2:22:46 PM', '2:22:54 PM'),
('2:22:49 PM', '2:27:21 PM'),
('2:22:57 PM', '2:23:03 PM'),
('2:23:29 PM', '2:23:40 PM'),
('2:24:08 PM', '2:24:14 PM'),
('2:27:37 PM', '2:39:14 PM'),
('2:27:47 PM', '2:27:55 PM'),
('2:29:04 PM', '2:29:26 PM'),
('2:29:31 PM', '2:29:43 PM'),
('2:29:45 PM', '2:30:10 PM'),
]
print(maximumOverlap(calls))
How about a naive approach:
- Take the least of the start times and the greatest of the end times (this is your range R)
- Take the shortest call duration -- d (sorting, O(nlog n))
- Create an array C, of ceil(R/d) integers, zero initialize
- Now, for each call, add 1 to the cells that define the call's duration O(n * ceil(R/d))
- Loop over the array C and save the max (O(n))
I guess you could model this as a graph too and fiddle around, but beats me at the moment.
In my opinion greedy algorithm will do the needful. The problem is similar to find out the number of platforms required for given trains timetable. So the number of overlaps will be the number of platforms required.
callStart times are sorted. Start putting each call in an array(a platform). So for call i and (i + 1)
, if callEnd[i] > callStart[i+1]
then they can not go in the same array (or platform) put as many calls in the first array as possible. Then repeat the process with rest ones till all calls are exhausted. In the end, number of arrays are maximum number of overlaps. And the complexity will be O(n)
.
The following page has examples of solving this problem in many languages: http://rosettacode.org/wiki/Max_Licenses_In_Use
You short the list on CallStart
. Then for each element (i
) you see for all j < i
if
CallEnd[j] > CallStart[i] // put it in a map with CallStart[i] as the key and some count
Rest should be easy enough.
It's amazing how for some problems solutions sometimes just pop out of one mind... and I think I probably the simplest solution ;)
You can represent the times in seconds, from the beginning of your range (0) to its end (600). A call is a pair of times.
Python algorithm:
def maxSimultaneousCalls(calls):
"""Returns the maximum number of simultaneous calls
calls : list of calls
(represented as pairs [begin,end] with begin and end in seconds)
"""
# Shift the calls so that 0 correspond to the beginning of the first call
min = min([call[0] for call in calls])
tmpCalls = [(call[0] - min, call[1] - min) for call in calls]
max = max([call[1] for call in tmpCalls])
# Find how many calls were active at each second during the interval [0,max]
seconds = [0 for i in range(0,max+1)]
for call in tmpCalls:
for i in range(call[0],call[1]):
seconds[i] += 1
return max(seconds)
Note that I don't know which calls were active at this time ;)
But in term of complexity it's extremely trivial to evaluate: it's linear in term of the total duration of the calls.
I think an important element of good solution for this problem is to recognize that each end time is >= the call's start time and that the start times are ordered. So rather than thinking in terms of reading the whole list and sorting we only need to read in order of start time and merge from a min-heap of the end times. This also addresses the comment Sanjeev made about how ends should be processed before starts when they have the exact same time value by polling from the end time min-heap and choosing it when it's value is <= the next start time.
max_calls = 0
// A min-heap will typically do the least amount of sorting needed here.
// It's size tells us the # of currently active calls.
// Note that multiple keys with the same value must be supported.
end_times = new MinHeap()
for call in calls:
end_times.add(call.end)
while (end_times.min_key() <= call.start) {
end_times.remove_min()
}
// Check size after calls have ended so that a start at the same time
// doesn't count as an additional call.
// Also handles zero duration calls as not counting.
if (end_times.size() > max_calls) max_calls = end_times.size()
}
This seems like a reduce
operation. The analogy is that each time a call is started, the current number of active calls is increased by 1. Each time a call is ended, the current number of calls drops to zero.
Once you have that stream of active calls all you need is to apply a max operation to them. Here is a working python2 example:
from itertools import chain
inp = ((123, 125),
(123, 130),
(123, 134),
(130, 131),
(130, 131),
(130, 132),)
# technical: tag each point as start or end of a call
data = chain(*(((a, 'start'), (b, 'end')) for a, b in inp))
def r(state, d):
last = state[-1]
# if a call is started we add one to the number of calls,
# if it ends we reduce one
current = (1 if d[1] == 'start' else -1)
state.append(last + current)
return state
max_intersect = max(reduce(r, sorted(data), [0]))
print max_intersect
精彩评论