Choose the closest k points from given n points

2023-02-20 17:03 问答作者：

You are given a set U开发者_开发百科 of n points on the plane and you can compute distance between any pair of points in constant time. Choose a subset of U called C such that C has exactly k points in it and the distance between the farthest 2 points in C is as small as possible for given k. 1 < k <= n

What's the fastest way to do this besides the obvious n-choose-k solution?

A solution is shown in Finding k points with minimum diameter and related problems - Aggarwal, 1991. The algorithm described therein has running time: O(k^2.5 n log k + n log n)

For those who have no access to the paper: the problem is called k-diameter and definied as

Find a set of k points with minimum diameter. The diameter of a set is the maximum distance between any points of the set.

I cannot really give an overview over the presented algorithm, but it includes computing the (3k - 3)th order Voronoi diagram of the points, and then solve the problem for each of the O(kn) Voronoi sets (by computing maximum independent sets in some bipartite graphs)... I guess that I am trying to say is, that it is highly non-trivial, and far beyond both an interview and this site :-)

Since this is an interview question, here is my shot at a solution. (As dcn points out below, this is not guaranteed to return the optimal solution, though it should still be a decent heuristic. Good catch, dcn!)

Create a set S_p with a single point P.
Compute the distance between every point in S_p and every point outside of it, then add the point with the smallest max distance to S_p.
Repeat 2. until S_p has k points.
Repeat 1-3 using each point once as the initial P. Take the S_p which has the smallest max distance.

There are O(k) points in S_p, and O(n) points outside of it, so finding the point with the smallest max distance is O(nk). We repeat this k times, then repeat the whole procedure n times, for an overall complexity of O(n²k²).

We can improve on this by caching the max distance between any point in S_p and each point outside of S_p. If maxDistanceFromPointInS[pointOutsideS] is, say, an O(1) hash-table containing the current max distance between every point pointOutsideS and some point inside S_p, then every time we add a new point newPoint, we set maxDistanceFromPointInS[p] = Max(maxDistanceFromPointInS[p], distance(newPoint, p)) for all points p outside of S_p. Then finding the smallest max distance is O(n), adding a point to S_p is O(n). Repeating this k times gives us O((n+n)k) = O(nk). Finally, we repeat the whole procedure n times, for an overall complexity of O(n²k).

We could improve finding the smallest max distance to O(1) using a heap, but that would not change the overall complexity.

By the way, it took an hour to write this all up - there's no way I could have done this in an interview.

This is still a messy idea, ~~I am not sure if it actually works~~ It does not work. Leaving the wrong answer here for posterity.

For each point in U
    make a list of the distance to each point in U
    sort the list
    add largest distance to a max-heap.
while any of the lists have more than k elements
    remove max of heap twice
    remove corresponding elements from the two lists they came from
    add the two newly exposed largest elements from those two lists to the heap
Any of the lists left with k elements will list the elements in C

Basically, find the two points who currently look like they could both be in the subset together, and which have the largest pairwise distance, and then rule out their both being in the subset together. Repeat until you are left with only one possible way to form a k-sized subset.

This should be time complexity O((n^2)log(n)) and space complexity O(n^2).

Its doable in deterministic polynomial time apparently.

But I don't understand their algorithm. It seems the circles they choose are always one of the input points. Can someone shed some light on this or explain neatly what they do (just the section 2 would suffice)?

The actual problem that was asked seems difficult. If the points were not in the plane and had arbitrary distances, it would be NP-hard by reduction from k-clique (take an arbitrary unweighted graph, and add edges of length infinity for missing edges, and edges of length 1 for existing edges -- a clique of size k exists whenever the 'closest k points' have a maximum mutual distance of 1 rather than infinity). However, since the points are constrained to being in the plane so that such distance labelings are forbidden, it's possible that there may be a solution. At least, it seems amenable to close approximation.

If your interview meant 'smallest diameter circle enclosing k points', then the following might be the fastest correct algorithm you could reasonably come up with by yourself in an interview:

For each set of size 3, compute the smallest circle that contains these points, and check whether each point is contained in this circle. If the number contained is at least k, update the minimum diameter accordingly.

Essentially, this is a quadruply nested for loop, so the running time is O(n^4).

This should be a good approximate solution if you don't have too many outliers

p = mean center (average x, y) of U
M = U sorted by distance to p
C = M[0:k]

继续阅读：algorithm computational-geometry geometry language-agnostic

Choose the closest k points from given n points

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？