开发者

Choose the closest k points from given n points

You are given a set U开发者_开发百科 of n points on the plane and you can compute distance between any pair of points in constant time. Choose a subset of U called C such that C has exactly k points in it and the distance between the farthest 2 points in C is as small as possible for given k. 1 < k <= n

What's the fastest way to do this besides the obvious n-choose-k solution?


A solution is shown in Finding k points with minimum diameter and related problems - Aggarwal, 1991. The algorithm described therein has running time: O(k^2.5 n log k + n log n)

For those who have no access to the paper: the problem is called k-diameter and definied as

Find a set of k points with minimum diameter. The diameter of a set is the maximum distance between any points of the set.

I cannot really give an overview over the presented algorithm, but it includes computing the (3k - 3)th order Voronoi diagram of the points, and then solve the problem for each of the O(kn) Voronoi sets (by computing maximum independent sets in some bipartite graphs)... I guess that I am trying to say is, that it is highly non-trivial, and far beyond both an interview and this site :-)


Since this is an interview question, here is my shot at a solution. (As dcn points out below, this is not guaranteed to return the optimal solution, though it should still be a decent heuristic. Good catch, dcn!)

  1. Create a set Sp with a single point P.
  2. Compute the distance between every point in Sp and every point outside of it, then add the point with the smallest max distance to Sp.
  3. Repeat 2. until Sp has k points.
  4. Repeat 1-3 using each point once as the initial P. Take the Sp which has the smallest max distance.

There are O(k) points in Sp, and O(n) points outside of it, so finding the point with the smallest max distance is O(nk). We repeat this k times, then repeat the whole procedure n times, for an overall complexity of O(n2k2).

We can improve on this by caching the max distance between any point in Sp and each point outside of Sp. If maxDistanceFromPointInS[pointOutsideS] is, say, an O(1) hash-table containing the current max distance between every point pointOutsideS and some point inside Sp, then every time we add a new point newPoint, we set maxDistanceFromPointInS[p] = Max(maxDistanceFromPointInS[p], distance(newPoint, p)) for all points p outside of Sp. Then finding the smallest max distance is O(n), adding a point to Sp is O(n). Repeating this k times gives us O((n+n)k) = O(nk). Finally, we repeat the whole procedure n times, for an overall complexity of O(n2k).

We could improve finding the smallest max distance to O(1) using a heap, but that would not change the overall complexity.


By the way, it took an hour to write this all up - there's no way I could have done this in an interview.


This is still a messy idea, I am not sure if it actually works It does not work. Leaving the wrong answer here for posterity.

For each point in U
    make a list of the distance to each point in U
    sort the list
    add largest distance to a max-heap.
while any of the lists have more than k elements
    remove max of heap twice
    remove corresponding elements from the two lists they came from
    add the two newly exposed largest elements from those two lists to the heap
Any of the lists left with k elements will list the elements in C

Basically, find the two points who currently look like they could both be in the subset together, and which have the largest pairwise distance, and then rule out their both being in the subset together. Repeat until you are left with only one possible way to form a k-sized subset.

This should be time complexity O((n^2)log(n)) and space complexity O(n^2).


Its doable in deterministic polynomial time apparently.

But I don't understand their algorithm. It seems the circles they choose are always one of the input points. Can someone shed some light on this or explain neatly what they do (just the section 2 would suffice)?


The actual problem that was asked seems difficult. If the points were not in the plane and had arbitrary distances, it would be NP-hard by reduction from k-clique (take an arbitrary unweighted graph, and add edges of length infinity for missing edges, and edges of length 1 for existing edges -- a clique of size k exists whenever the 'closest k points' have a maximum mutual distance of 1 rather than infinity). However, since the points are constrained to being in the plane so that such distance labelings are forbidden, it's possible that there may be a solution. At least, it seems amenable to close approximation.

If your interview meant 'smallest diameter circle enclosing k points', then the following might be the fastest correct algorithm you could reasonably come up with by yourself in an interview:

For each set of size 3, compute the smallest circle that contains these points, and check whether each point is contained in this circle. If the number contained is at least k, update the minimum diameter accordingly.

Essentially, this is a quadruply nested for loop, so the running time is O(n^4).


This should be a good approximate solution if you don't have too many outliers

p = mean center (average x, y) of U
M = U sorted by distance to p
C = M[0:k]
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜