开发者

Which is the fastest way to search a value within a set of many "range" objects in Python

I have a list of many Python objects like this:

class RangeClass(object):

    def __init__(self,address,size):
        self.address=address
      开发者_如何学编程  self.size=size
        #other attributes and methods...

Then, I have a list (rangelist) of RangeClass objects.

I need to find within which range a given value is.

I can use some code like this:

for r in ragelist:
    if(value>=r.address and value<(r.address+r.size)):
        return r
return None

But I think there is a faster way. Ranges have arbitrary size, but we can assume that they don't overlap.

Thank you.


If you have many values to test, then you could use the bisect module to find which range the values are in more quickly.

If

  • m = the number of values to test, and
  • n = len(rangelist)

then looping through the values and rangelist as you suggest would take O(m*n) time.

If you use bisection, then you must first sort the starting addresses O(nlogn) and find each value's place in rangelist O(m*logn). So if

O(nlogn + m*logn) < O(m*n)

then bisection wins. For large n, O(m*logn) is miniscule compared to O(m*n). So the inequality above would be true if

O(nlogn) < O(m*n)

or equivalently, when

C log(n) < m

for some constant C.


Thus, when n is large and C log(n) < m, you might try something like

import bisect

class RangeClass(object):

    def __init__(self,address,size):
        self.address=address
        self.size=size
    def __str__(self):
        return '[{0},{1})'.format(self.address,self.address+self.size)
    def __lt__(self,other):
        return self.address<other.address

rangelist=sorted([RangeClass(i,1) for i in (1,3,4,5,7.5)])
starts=[r.address for r in rangelist]

def find_range(value):
    start_idx=bisect.bisect(starts,value)-1
    try:
        r=rangelist[start_idx]
    except IndexError:
        return None
    start=r.address
    end=r.address+r.size
    if start<=value<end:
        return rangelist[start_idx]
    return None    

print(','.join(str(r) for r in rangelist))

for value in (0,1,1.5,2,3,4,5,6,7,8,9,10):
    r=find_range(value)
    if r:
        print('{v} in {r}'.format(v=value,r=str(r)))
    else:
        print('{v} not in any range'.format(v=value))


Not really. All you can do is take advantage of Python's relational operator chaining.

if r.address <= value < (r.address + r.size):

You could also define __contains__ on RangeClass to allow you to use in to find it instead.

class RangeClass(object):
   ...
  def __contains__(self, val):
    return self.address <= val < (self.address + self.size)

 ...
  if value in r:


Implement the comparison operator in Range, have the range list sorted, and use bisect to search for the range a value belongs in:

import bisect
def find_range(value):
    index = bisect.bisect(rangelist, value)
    if index not in (0, len(rangelist)):
        index -= 1
    return rangelist[index]


Thank you to everyone,

I am actually using the method proposed by unutbu.

Moreover, I add another optimization:

if(value <= first_range_value or value >= last_range_value):
    return None

Where first_range_value and last_range_value have been computed before, and they are the smallest r.address value and the largest r.address+r.size value .

This is worthing in my application, but it really depends on the distribution of ranges and values.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜