A faster way of comparing two lists of point-tuples?

2023-03-11 08:24 问答作者：

I have two lists (whi开发者_StackOverflow社区ch may or may not be the same length). In each list, are a series of tuples of two points (basically X, Y values).

I am comparing the two lists against each other to find two points with similar point values. I have tried list comprehension techniques, but it got really confusing with the nested tuples inside of the lists and I couldn't get it to work.

Is this the best (fastest) way of doing this? I feel like there might be a more Pythonic way of doing this.

Say I have two lists:

pointPairA = [(2,1), (4,8)]
pointPairB = [(3,2), (10,2), (4,2)]

And then an empty list for storing the pairs and a tolerance value to store only matched pairs

matchedPairs = []
tolerance = 2

And then this loop that unpacks the tuples, compares the difference, and adds them to the matchedPairs list to indicate a match.

for pointPairA in pointPairListA:
    for pointPairB in pointPairListB:
        ## Assign the current X,Y values for each pair
        pointPairA_x, pointPairA_y = pointPairA
        pointPairB_x, pointPairB_x = pointPairB

        ## Get the difference of each set of points
        xDiff = abs(pointPairA_x - pointPairB_x)
        yDiff = abs(pointPairA1_y - pointPairB_y)

        if xDiff < tolerance and yDiff < tolerance:
            matchedPairs.append((pointPairA, pointPairB))

That would result in matchedPairs looking like this, with tuples of both point tuples inside:

[( (2,1), (3,2) ), ( (2,1), (4,2) )]

Here pointpairA is the single list and pointpairB would be one of the list of 20k

from collections import defaultdict
from itertools import product

pointPairA = [(2,1), (4,8)]
pointPairB = [(3,2), (10,2), (4,2)]
tolerance = 2

dA = defaultdict(list)
tolrange = range(-tolerance, tolerance+1)
for pA, dx, dy in product(pointPairA, tolrange, tolrange):
    dA[pA[0]+dx,pA[1]+dy].append(pA)

# you would have a loop here though the 20k lists
matchedPairs = [(pA, pB) for pB in pointPairB for pA in dA[pB]]  

print matchedPairs

If these lists are large, I would suggest finding a faster algorithm...

I would start by sorting both lists of pairs by the sum of the (x,y) in the pair. (Because two points can be close only if their sums are close.)

For any point in the first list, that will severely limit the range you need to search in the second list. Keep track of a "sliding window" on the second list, corresponding to the elements whose sums are within 2*tolerance of the sum of the current element of the first list. (Actually, you only need to keep track of the start of the sliding window...)

Assuming tolerance is reasonably small, this should convert your O(n^2) operation into O(n log n).

With list comprehension:

[(pa, pb) for pa in pointPairA for pb in pointPairB \
          if abs(pa[0]-pb[0]) <= tolerance and abs(pa[1]-pb[1]) <= tolerance]

Slightly much faster than your loop:

(for 1 million executions)

>>> (list comprehension).timeit()
2.1963138580322266 s

>>> (your method).timeit()
2.454944133758545 s

继续阅读：list-comprehension python

A faster way of comparing two lists of point-tuples?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？