开发者

Merging a list of time-range tuples that have overlapping time-ranges

I have a list of tuples where each tuple is a (start-time, end-time). I am trying to merge all overlapping time ranges and return a list of distinct time ranges. For example

[(1, 5), (2, 4), (3, 6)] -开发者_StackOverflow社区-->  [(1,6)]
[(1, 3), (2, 4), (5, 8)] --->  [(1, 4), (5,8)]

Here is how I implemented it.

# Algorithm
# initialranges: [(a,b), (c,d), (e,f), ...]
# First we sort each tuple then whole list.
# This will ensure that a<b, c<d, e<f ... and a < c < e ... 
# BUT the order of b, d, f ... is still random
# Now we have only 3 possibilities
#================================================
# b<c<d: a-------b           Ans: [(a,b),(c,d)]
#                  c---d
# c<=b<d: a-------b          Ans: [(a,d)]
#               c---d
# c<d<b: a-------b           Ans: [(a,b)]
#         c---d
#================================================
def mergeoverlapping(initialranges):
    i = sorted(set([tuple(sorted(x)) for x in initialranges]))

    # initialize final ranges to [(a,b)]
    f = [i[0]]
    for c, d in i[1:]:
        a, b = f[-1]
        if c<=b<d:
            f[-1] = a, d
        elif b<c<d:
            f.append((c,d))
        else:
            # else case included for clarity. Since 
            # we already sorted the tuples and the list
            # only remaining possibility is c<d<b
            # in which case we can silently pass
            pass
    return f

I am trying to figure out if

  1. Is the a an built-in function in some python module that can do this more efficiently? or
  2. Is there a more pythonic way of accomplishing the same goal?

Your help is appreciated. Thanks!


A few ways to make it more efficient, Pythonic:

  1. Eliminate the set() construction, since the algorithm should prune out duplicates during in the main loop.
  2. If you just need to iterate over the results, use yield to generate the values.
  3. Reduce construction of intermediate objects, for example: move the tuple() call to the point where the final values are produced, saving you from having to construct and throw away extra tuples, and reuse a list saved for storing the current time range for comparison.

Code:

def merge(times):
    saved = list(times[0])
    for st, en in sorted([sorted(t) for t in times]):
        if st <= saved[1]:
            saved[1] = max(saved[1], en)
        else:
            yield tuple(saved)
            saved[0] = st
            saved[1] = en
    yield tuple(saved)

data = [
    [(1, 5), (2, 4), (3, 6)],
    [(1, 3), (2, 4), (5, 8)]
    ]

for times in data:
    print list(merge(times))


Sort tuples then list, if t1.right>=t2.left => merge and restart with the new list, ...

-->

def f(l, sort = True):
    if sort:
        sl = sorted(tuple(sorted(i)) for i in l)
    else:
        sl = l
    if len(sl) > 1:
        if sl[0][1] >= sl[1][0]:
            sl[0] = (sl[0][0], sl[1][1])
            del sl[1]
            if len(sl) < len(l):
                return f(sl, False)
    return sl


The sort part: use standard sorting, it compares tuples the right way already.

sorted_tuples = sorted(initial_ranges)

The merge part. It eliminates duplicate ranges, too, so no need for a set. Suppose you have current_tuple and next_tuple.

c_start, c_end = current_tuple
n_start, n_end = next_tuple
if n_start <= c_end: 
  merged_tuple = min(c_start, n_start), max(c_end, n_end)

I hope the logic is clear enough.

To peek next tuple, you can use indexed access to sorted tuples; it's a wholly known sequence anyway.


Sort all boundaries then take all pairs where a boundary end is followed by a boundary start.

def mergeOverlapping(initialranges):
    def allBoundaries():
        for r in initialranges:
            yield r[0], True
            yield r[1], False

    def getBoundaries(boundaries):
        yield boundaries[0][0]
        for i in range(1, len(boundaries) - 1):
            if not boundaries[i][1] and boundaries[i + 1][1]:
                yield boundaries[i][0]
                yield boundaries[i + 1][0]
        yield boundaries[-1][0]

    return getBoundaries(sorted(allBoundaries()))

Hm, not that beautiful but was fun to write at least!

EDIT: Years later, after an upvote, I realised my code was wrong! This is the new version just for fun:

def mergeOverlapping(initialRanges):
    def allBoundaries():
        for r in initialRanges:
            yield r[0], -1
            yield r[1], 1

    def getBoundaries(boundaries):
        openrange = 0
        for value, boundary in boundaries:
            if not openrange:
                yield value
            openrange += boundary
            if not openrange:
                yield value

    def outputAsRanges(b):
        while b:
            yield (b.next(), b.next())

    return outputAsRanges(getBoundaries(sorted(allBoundaries())))

Basically I mark the boundaries with -1 or 1 and then sort them by value and only output them when the balance between open and closed braces is zero.


Late, but might help someone looking for this. I had a similar problem but with dictionaries. Given a list of time ranges, I wanted to find overlaps and merge them when possible. A little modification to @samplebias answer led me to this:

Merge function:

def merge_range(ranges: list, start_key: str, end_key: str):
    ranges = sorted(ranges, key=lambda x: x[start_key])
    saved = dict(ranges[0])

    for range_set in sorted(ranges, key=lambda x: x[start_key]):
        if range_set[start_key] <= saved[end_key]:
            saved[end_key] = max(saved[end_key], range_set[end_key])
        else:
            yield dict(saved)
            saved[start_key] = range_set[start_key]
            saved[end_key] = range_set[end_key]
    yield dict(saved)

Data:

data = [
    {'start_time': '09:00:00', 'end_time': '11:30:00'},
    {'start_time': '15:00:00', 'end_time': '15:30:00'},
    {'start_time': '11:00:00', 'end_time': '14:30:00'},
    {'start_time': '09:30:00', 'end_time': '14:00:00'}
]

Execution:

print(list(merge_range(ranges=data, start_key='start_time', end_key='end_time')))

Output:

[
    {'start_time': '09:00:00', 'end_time': '14:30:00'},
    {'start_time': '15:00:00', 'end_time': '15:30:00'}
]


When using Python 3.7, following the suggestion given by “RuntimeError: generator raised StopIteration” every time I try to run app, the method outputAsRanges from @UncleZeiv should be:

    def outputAsRanges(b):
        while b:
            try:
                yield (next(b), next(b))
            except StopIteration:
                return
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜