Merging a list of time-range tuples that have overlapping time-ranges

2023-02-25 15:26 问答作者：

I have a list of tuples where each tuple is a (start-time, end-time). I am trying to merge all overlapping time ranges and return a list of distinct time ranges. For example

[(1, 5), (2, 4), (3, 6)] -开发者_StackOverflow社区-->  [(1,6)]
[(1, 3), (2, 4), (5, 8)] --->  [(1, 4), (5,8)]

Here is how I implemented it.

# Algorithm
# initialranges: [(a,b), (c,d), (e,f), ...]
# First we sort each tuple then whole list.
# This will ensure that a<b, c<d, e<f ... and a < c < e ... 
# BUT the order of b, d, f ... is still random
# Now we have only 3 possibilities
#================================================
# b<c<d: a-------b           Ans: [(a,b),(c,d)]
#                  c---d
# c<=b<d: a-------b          Ans: [(a,d)]
#               c---d
# c<d<b: a-------b           Ans: [(a,b)]
#         c---d
#================================================
def mergeoverlapping(initialranges):
    i = sorted(set([tuple(sorted(x)) for x in initialranges]))

    # initialize final ranges to [(a,b)]
    f = [i[0]]
    for c, d in i[1:]:
        a, b = f[-1]
        if c<=b<d:
            f[-1] = a, d
        elif b<c<d:
            f.append((c,d))
        else:
            # else case included for clarity. Since 
            # we already sorted the tuples and the list
            # only remaining possibility is c<d<b
            # in which case we can silently pass
            pass
    return f

I am trying to figure out if

Is the a an built-in function in some python module that can do this more efficiently? or
Is there a more pythonic way of accomplishing the same goal?

Your help is appreciated. Thanks!

A few ways to make it more efficient, Pythonic:

Eliminate the set() construction, since the algorithm should prune out duplicates during in the main loop.
If you just need to iterate over the results, use yield to generate the values.
Reduce construction of intermediate objects, for example: move the tuple() call to the point where the final values are produced, saving you from having to construct and throw away extra tuples, and reuse a list saved for storing the current time range for comparison.

Code:

def merge(times):
    saved = list(times[0])
    for st, en in sorted([sorted(t) for t in times]):
        if st <= saved[1]:
            saved[1] = max(saved[1], en)
        else:
            yield tuple(saved)
            saved[0] = st
            saved[1] = en
    yield tuple(saved)

data = [
    [(1, 5), (2, 4), (3, 6)],
    [(1, 3), (2, 4), (5, 8)]
    ]

for times in data:
    print list(merge(times))

Sort tuples then list, if t1.right>=t2.left => merge and restart with the new list, ...

-->

def f(l, sort = True):
    if sort:
        sl = sorted(tuple(sorted(i)) for i in l)
    else:
        sl = l
    if len(sl) > 1:
        if sl[0][1] >= sl[1][0]:
            sl[0] = (sl[0][0], sl[1][1])
            del sl[1]
            if len(sl) < len(l):
                return f(sl, False)
    return sl

The sort part: use standard sorting, it compares tuples the right way already.

sorted_tuples = sorted(initial_ranges)

The merge part. It eliminates duplicate ranges, too, so no need for a set. Suppose you have current_tuple and next_tuple.

c_start, c_end = current_tuple
n_start, n_end = next_tuple
if n_start <= c_end: 
  merged_tuple = min(c_start, n_start), max(c_end, n_end)

I hope the logic is clear enough.

To peek next tuple, you can use indexed access to sorted tuples; it's a wholly known sequence anyway.

Sort all boundaries then take all pairs where a boundary end is followed by a boundary start.

def mergeOverlapping(initialranges):
    def allBoundaries():
        for r in initialranges:
            yield r[0], True
            yield r[1], False

    def getBoundaries(boundaries):
        yield boundaries[0][0]
        for i in range(1, len(boundaries) - 1):
            if not boundaries[i][1] and boundaries[i + 1][1]:
                yield boundaries[i][0]
                yield boundaries[i + 1][0]
        yield boundaries[-1][0]

    return getBoundaries(sorted(allBoundaries()))

Hm, not that beautiful but was fun to write at least!

EDIT: Years later, after an upvote, I realised my code was wrong! This is the new version just for fun:

def mergeOverlapping(initialRanges):
    def allBoundaries():
        for r in initialRanges:
            yield r[0], -1
            yield r[1], 1

    def getBoundaries(boundaries):
        openrange = 0
        for value, boundary in boundaries:
            if not openrange:
                yield value
            openrange += boundary
            if not openrange:
                yield value

    def outputAsRanges(b):
        while b:
            yield (b.next(), b.next())

    return outputAsRanges(getBoundaries(sorted(allBoundaries())))

Basically I mark the boundaries with -1 or 1 and then sort them by value and only output them when the balance between open and closed braces is zero.

Late, but might help someone looking for this. I had a similar problem but with dictionaries. Given a list of time ranges, I wanted to find overlaps and merge them when possible. A little modification to @samplebias answer led me to this:

Merge function:

def merge_range(ranges: list, start_key: str, end_key: str):
    ranges = sorted(ranges, key=lambda x: x[start_key])
    saved = dict(ranges[0])

    for range_set in sorted(ranges, key=lambda x: x[start_key]):
        if range_set[start_key] <= saved[end_key]:
            saved[end_key] = max(saved[end_key], range_set[end_key])
        else:
            yield dict(saved)
            saved[start_key] = range_set[start_key]
            saved[end_key] = range_set[end_key]
    yield dict(saved)

Data:

data = [
    {'start_time': '09:00:00', 'end_time': '11:30:00'},
    {'start_time': '15:00:00', 'end_time': '15:30:00'},
    {'start_time': '11:00:00', 'end_time': '14:30:00'},
    {'start_time': '09:30:00', 'end_time': '14:00:00'}
]

Execution:

print(list(merge_range(ranges=data, start_key='start_time', end_key='end_time')))

Output:

[
    {'start_time': '09:00:00', 'end_time': '14:30:00'},
    {'start_time': '15:00:00', 'end_time': '15:30:00'}
]

When using Python 3.7, following the suggestion given by “RuntimeError: generator raised StopIteration” every time I try to run app, the method outputAsRanges from @UncleZeiv should be:

    def outputAsRanges(b):
        while b:
            try:
                yield (next(b), next(b))
            except StopIteration:
                return

继续阅读：algorithm merge python

Merging a list of time-range tuples that have overlapping time-ranges

-->

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

-->