A faster way of comparing two lists of point-tuples?
I have two lists (whi开发者_StackOverflow社区ch may or may not be the same length). In each list, are a series of tuples of two points (basically X, Y values).
I am comparing the two lists against each other to find two points with similar point values. I have tried list comprehension techniques, but it got really confusing with the nested tuples inside of the lists and I couldn't get it to work.
Is this the best (fastest) way of doing this? I feel like there might be a more Pythonic way of doing this.
Say I have two lists:
pointPairA = [(2,1), (4,8)]
pointPairB = [(3,2), (10,2), (4,2)]
And then an empty list for storing the pairs and a tolerance value to store only matched pairs
matchedPairs = []
tolerance = 2
And then this loop that unpacks the tuples, compares the difference, and adds them to the matchedPairs list to indicate a match.
for pointPairA in pointPairListA:
for pointPairB in pointPairListB:
## Assign the current X,Y values for each pair
pointPairA_x, pointPairA_y = pointPairA
pointPairB_x, pointPairB_x = pointPairB
## Get the difference of each set of points
xDiff = abs(pointPairA_x - pointPairB_x)
yDiff = abs(pointPairA1_y - pointPairB_y)
if xDiff < tolerance and yDiff < tolerance:
matchedPairs.append((pointPairA, pointPairB))
That would result in matchedPairs looking like this, with tuples of both point tuples inside:
[( (2,1), (3,2) ), ( (2,1), (4,2) )]
Here pointpairA is the single list and pointpairB would be one of the list of 20k
from collections import defaultdict
from itertools import product
pointPairA = [(2,1), (4,8)]
pointPairB = [(3,2), (10,2), (4,2)]
tolerance = 2
dA = defaultdict(list)
tolrange = range(-tolerance, tolerance+1)
for pA, dx, dy in product(pointPairA, tolrange, tolrange):
dA[pA[0]+dx,pA[1]+dy].append(pA)
# you would have a loop here though the 20k lists
matchedPairs = [(pA, pB) for pB in pointPairB for pA in dA[pB]]
print matchedPairs
If these lists are large, I would suggest finding a faster algorithm...
I would start by sorting both lists of pairs by the sum of the (x,y) in the pair. (Because two points can be close only if their sums are close.)
For any point in the first list, that will severely limit the range you need to search in the second list. Keep track of a "sliding window" on the second list, corresponding to the elements whose sums are within 2*tolerance
of the sum of the current element of the first list. (Actually, you only need to keep track of the start of the sliding window...)
Assuming tolerance
is reasonably small, this should convert your O(n^2) operation into O(n log n).
With list comprehension:
[(pa, pb) for pa in pointPairA for pb in pointPairB \
if abs(pa[0]-pb[0]) <= tolerance and abs(pa[1]-pb[1]) <= tolerance]
Slightly much faster than your loop:
(for 1 million executions)
>>> (list comprehension).timeit()
2.1963138580322266 s
>>> (your method).timeit()
2.454944133758545 s
精彩评论