开发者

How to find range overlap in python?

What is the best way in Python to determine what values in two ranges overlap?

For example:

x = range(1,10)
y = range(8,20)

(The answer I am looking for would be the integers 8 and 9.)
        

Given a range, x, what is the best way to iterate through another range, y and output all values that are shared by both ranges?

EDIT:

As a follow-up, I realized that I also need to know if x does or does not overlap y. I am looking for a way to iterate through a list of ranges and and d开发者_JAVA技巧o a number of additional things with range that overlap. Is there a simple True/False statement to accomplish this?


If the step is always +1 (which is the default for range) the following should be more efficient than converting each list to a set or iterating over either list:

range(max(x[0], y[0]), min(x[-1], y[-1])+1)


Try with set intersection:

x = range(1,10)
y = range(8,20)
xs = set(x)
xs.intersection(y)   

Output:

set([8, 9])

Note that intersection accepts any iterable as an argument (y is not required to be converted to a set for the operation). There is an operator equivalent to the intersection method: & but, in this case, it requires both arguments to be sets.


You can use sets for that, but be aware that set(list) removes all duplicate entries from the list:

>>> x = range(1,10)
>>> y = range(8,20)
>>> list(set(x) & set(y))
[8, 9]


The answers above seem mostly overly complex. This one liner works perfectly in Python3, takes ranges as inputs and output. It also handles illegal ranges. To get the values just iterate over the result if not None.

# return overlap range for two range objects or None if no ovelap
# does not handle step!=1
def range_intersect(r1, r2):
    return range(max(r1.start,r2.start), min(r1.stop,r2.stop)) or None


If you looking for the overlap between two real-valued bounded intervals, then this is quite nice:

def overlap(start1, end1, start2, end2):
    """how much does the range (start1, end1) overlap with (start2, end2)"""
    return max(max((end2-start1), 0) - max((end2-end1), 0) - max((start2-start1), 0), 0)

I couldn't find this online anywhere so I came up with this and I'm posting here.


One option is to just use list comprehension like:

x = range(1,10) 
y = range(8,20) 

z = [i for i in x if i in y]
print z


This is the answer for the simple range with step=1 case (99% of the time), which can be 2500x faster as shown in the benchmark when comparing long ranges using sets (when you are just interested in knowing if there is overlap):

x = range(1,10) 
y = range(8,20)

def range_overlapping(x, y):
    if x.start == x.stop or y.start == y.stop:
        return False
    return x.start <= y.stop and y.start <= x.stop

>>> range_overlapping(x, y)
True

To find the overlapping values:

def overlap(x, y):
    if not range_overlapping(x, y):
        return set()
    return set(range(max(x.start, y.start), min(x.stop, y.stop)+1))

Visual help:

|  |           |    |
  |  |       |    |

Benchmark:

x = range(1,10)
y = range(8,20)

In [151]: %timeit set(x).intersection(y)
2.74 µs ± 11.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [152]: %timeit range_overlapping(x, y)
1.4 µs ± 2.91 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Conclusion: even for small ranges, it is twice as fast.

x = range(1,10000)
y = range(50000, 500000)

In [155]: %timeit set(x).intersection(y)
43.1 ms ± 158 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [156]: %timeit range_overlapping(x, y)
1.75 µs ± 88.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Conclusion: you want to use the range_overlapping function in this case as it is 2500x faster (my personal record in speedup)


For "if x does or does not overlap y" :

for a,b,c,d in ((1,10,10,14),
                (1,10,9,14),
                (1,10,4,14),
                (1,10,4,10),
                (1,10,4,9),
                (1,10,4,7),
                (1,10,1,7),
                (1,10,-3,7),
                (1,10,-3,2),
                (1,10,-3,1),
                (1,10,-11,-5)):
    x = range(a,b)
    y = range(c,d)
    print 'x==',x
    print 'y==',y
    b = not ((x[-1]<y[0]) or (y[-1]<x[0]))
    print '    x %s y' % ("does not overlap","   OVERLAPS  ")[b]
    print

result

x== [1, 2, 3, 4, 5, 6, 7, 8, 9]
y== [10, 11, 12, 13]
    x does not overlap y

x== [1, 2, 3, 4, 5, 6, 7, 8, 9]
y== [9, 10, 11, 12, 13]
    x    OVERLAPS   y

x== [1, 2, 3, 4, 5, 6, 7, 8, 9]
y== [4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
    x    OVERLAPS   y

x== [1, 2, 3, 4, 5, 6, 7, 8, 9]
y== [4, 5, 6, 7, 8, 9]
    x    OVERLAPS   y

x== [1, 2, 3, 4, 5, 6, 7, 8, 9]
y== [4, 5, 6, 7, 8]
    x    OVERLAPS   y

x== [1, 2, 3, 4, 5, 6, 7, 8, 9]
y== [4, 5, 6]
    x    OVERLAPS   y

x== [1, 2, 3, 4, 5, 6, 7, 8, 9]
y== [1, 2, 3, 4, 5, 6]
    x    OVERLAPS   y

x== [1, 2, 3, 4, 5, 6, 7, 8, 9]
y== [-3, -2, -1, 0, 1, 2, 3, 4, 5, 6]
    x    OVERLAPS   y

x== [1, 2, 3, 4, 5, 6, 7, 8, 9]
y== [-3, -2, -1, 0, 1]
    x    OVERLAPS   y

x== [1, 2, 3, 4, 5, 6, 7, 8, 9]
y== [-3, -2, -1, 0]
    x does not overlap y

x== [1, 2, 3, 4, 5, 6, 7, 8, 9]
y== [-11, -10, -9, -8, -7, -6]
    x does not overlap y

Edit 1

Speeds comparison:

from time import clock

x = range(-12,15)
y = range(-5,3)
te = clock()
for i in xrange(100000):
    w = set(x).intersection(y)
print '                     set(x).intersection(y)',clock()-te


te = clock()
for i in xrange(100000):
    w = range(max(x[0], y[0]), min(x[-1], y[-1])+1)
print 'range(max(x[0], y[0]), min(x[-1], y[-1])+1)',clock()-te

result

                     set(x).intersection(y) 0.951059981087
range(max(x[0], y[0]), min(x[-1], y[-1])+1) 0.377761978129

The ratio of these execution's times is 2.5


If you want to find the overlap of ranges with arbitrary steps you can use my package https://github.com/avnr/rangeplus which provides a Range() class compatible with Python range(), plus some goodies including intersections:

>>> from rangeplus import Range
>>> Range(1, 100, 3) & Range(2, 100, 4)
Range(10, 100, 12)
>>> Range(200, -200, -7) & range(5, 80, 2)  # can intersect with Python range() too
Range(67, 4, -14)

Range() can also be unbound (when stop is None the Range goes on to +/-infinity):

>>> Range(1, None, 3) & Range(3, None, 4)
Range(7, None, 12)
>>> Range(253, None, -3) & Range(208, 310, 5)
Range(253, 207, -15)

The intersection is computed, not iterated, which makes the efficiency of the implementation independent of the length of the Range().


This solution generates integers that are in the intersection of an arbitrary number of range objects in O(1) memory.

Disclosure: I got this from a user in Python Chat after I tried something else... less elegant.

Solution

def range_intersection(*ranges):
    ranges = set(ranges)  # `range` is hashable so we can easily eliminate duplicates
    if not ranges: return
    
    shortest_range = min(ranges, key=len)  # we will iterate over one, so choose the shortest one
    ranges.remove(shortest_range)          # note: `range` has a length, so we can use `len`
    
    for i in shortest_range:
        if all(i in range_ for range_ in ranges): yield i  # Finally, `range` implements `__contains__`
                                                           # by checking if an iteger satisfies it's simple formula

Testing

OP's problem

x = range(1,10)
y = range(8,20)
list(range_intersection(x, y))

[8, 9]

My example

limit = 10_000
list(range_intersection(
    range(2, limit, 2),
    range(3, limit, 3),
    range(5, limit, 5),
    range(41, limit, 41),
))

[1230, 2460, 3690, 4920, 6150, 7380, 8610, 9840]


Assuming you are working exclusively with ranges, with a step of 1, you can do it quickly with math.

def range_intersect(range_x,range_y):
    if len(range_x) == 0 or len(range_y) == 0:
        return []
    # find the endpoints
    x = (range_x[0], range_x[-1]) # from the first element to the last, inclusive
    y = (range_y[0], range_y[-1])
    # ensure min is before max
    # this can be excluded if the ranges must always be increasing
    x = tuple(sorted(x))
    y = tuple(sorted(y))
    # the range of the intersection is guaranteed to be from the maximum of the min values to the minimum of the max values, inclusive
    z = (max(x[0],y[0]),min(x[1],y[1]))
    if z[0] < z[1]:
        return range(z[0], z[1] + 1) # to make this an inclusive range
    else:
        return [] # no intersection

On a pair of ranges each with over 10^7 elements, this took under a second, independent of how many elements overlapped. I tried with 10^8 or so elements, but my computer froze for a while. I doubt you'd be working with lists that long.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜