开发者

A faster way of comparing two lists of point-tuples?

I have two lists (whi开发者_StackOverflow社区ch may or may not be the same length). In each list, are a series of tuples of two points (basically X, Y values).

I am comparing the two lists against each other to find two points with similar point values. I have tried list comprehension techniques, but it got really confusing with the nested tuples inside of the lists and I couldn't get it to work.

Is this the best (fastest) way of doing this? I feel like there might be a more Pythonic way of doing this.

Say I have two lists:

pointPairA = [(2,1), (4,8)]
pointPairB = [(3,2), (10,2), (4,2)]

And then an empty list for storing the pairs and a tolerance value to store only matched pairs

matchedPairs = []
tolerance = 2

And then this loop that unpacks the tuples, compares the difference, and adds them to the matchedPairs list to indicate a match.

for pointPairA in pointPairListA:
    for pointPairB in pointPairListB:
        ## Assign the current X,Y values for each pair
        pointPairA_x, pointPairA_y = pointPairA
        pointPairB_x, pointPairB_x = pointPairB

        ## Get the difference of each set of points
        xDiff = abs(pointPairA_x - pointPairB_x)
        yDiff = abs(pointPairA1_y - pointPairB_y)

        if xDiff < tolerance and yDiff < tolerance:
            matchedPairs.append((pointPairA, pointPairB))

That would result in matchedPairs looking like this, with tuples of both point tuples inside:

[( (2,1), (3,2) ), ( (2,1), (4,2) )]


Here pointpairA is the single list and pointpairB would be one of the list of 20k

from collections import defaultdict
from itertools import product

pointPairA = [(2,1), (4,8)]
pointPairB = [(3,2), (10,2), (4,2)]
tolerance = 2

dA = defaultdict(list)
tolrange = range(-tolerance, tolerance+1)
for pA, dx, dy in product(pointPairA, tolrange, tolrange):
    dA[pA[0]+dx,pA[1]+dy].append(pA)

# you would have a loop here though the 20k lists
matchedPairs = [(pA, pB) for pB in pointPairB for pA in dA[pB]]  

print matchedPairs


If these lists are large, I would suggest finding a faster algorithm...

I would start by sorting both lists of pairs by the sum of the (x,y) in the pair. (Because two points can be close only if their sums are close.)

For any point in the first list, that will severely limit the range you need to search in the second list. Keep track of a "sliding window" on the second list, corresponding to the elements whose sums are within 2*tolerance of the sum of the current element of the first list. (Actually, you only need to keep track of the start of the sliding window...)

Assuming tolerance is reasonably small, this should convert your O(n^2) operation into O(n log n).


With list comprehension:

[(pa, pb) for pa in pointPairA for pb in pointPairB \
          if abs(pa[0]-pb[0]) <= tolerance and abs(pa[1]-pb[1]) <= tolerance]

Slightly much faster than your loop:

(for 1 million executions)

>>> (list comprehension).timeit()
2.1963138580322266 s

>>> (your method).timeit()
2.454944133758545 s
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜