Find closest point of every point (Nearest Neighbor)
I am writing a method that takes as input an array of points and finds, for each point in the array, the closest point to it other than itself. I am currently doing this in a brute force way (cheking every point with every other point). My current implimentation doesn't have the array sorted but i can sort it by p.x values with开发者_Python百科 the CompareByX method. I am chekcking the running time of the algorithm, and it gets very time consuming with large values of n. I am not very knowledgable on this subject and know very littel about different types of data structures, any simple help would be great!
My current code is:
import java.util.*;
import java.lang.*;
import java.io.*;
class My2dPoint {
double x;
double y;
public My2dPoint(double x1, double y1) {
x=x1;
y=y1;
}
}
class CompareByX implements Comparator<My2dPoint> {
public int compare(My2dPoint p1, My2dPoint p2) {
if (p1.x < p2.x) return -1;
if (p1.x == p2.x) return 0;
return 1;
}
}
/* An object of the above comparator class is used by java.util.Arrays.sort() in main to sort an array of points by x-coordinates */
class Auxiliaries {
public static double distSquared(My2dPoint p1, My2dPoint p2) {
double result;
result = (p1.x-p2.x)*(p1.x-p2.x) + (p1.y-p2.y)*(p1.y-p2.y);
return result;
}
}
public class HW3 {
public static void main (String argv []) throws IOException {
int range = 1000000; // Range of x and y coordinates in points
System.out.println("Enter the number of points");
InputStreamReader reader1 = new InputStreamReader(System.in);
BufferedReader buffer1 = new BufferedReader(reader1);
String npoints = buffer1.readLine();
int numpoints = Integer.parseInt(npoints);
// numpoints is now the number of points we wish to generate
My2dPoint inputpoints [] = new My2dPoint [numpoints];
// array to hold points
int closest [] = new int [numpoints];
// array to record soln; closest[i] is index of point closest to i'th
int px, py;
double dx, dy, dist;
int i,j;
double currbest;
int closestPointIndex;
long tStart, tEnd;
for (i = 0; i < numpoints; i++) {
px = (int) ( range * Math.random());
dx = (double) px;
py = (int) (range * Math.random());
dy = (double) py;
inputpoints[i] = new My2dPoint(dx, dy);
}
// array inputpoints has now been filled
tStart = System.currentTimeMillis();
// find closest [0]
closest[0] = 1;
currbest = Auxiliaries.distSquared(inputpoints[0],inputpoints[1]);
for (j = 2; j < numpoints; j++) {
dist = Auxiliaries.distSquared(inputpoints[0],inputpoints[j]);
if (dist < currbest) {
closest[0] = j;
currbest = dist;
}
}
// now find closest[i] for every other i
for (i = 1; i < numpoints; i++) {
closest[i] = 0;
currbest = Auxiliaries.distSquared(inputpoints[i],inputpoints[0]);
for (j = 1; j < i; j++) {
dist = Auxiliaries.distSquared(inputpoints[i],inputpoints[j]);
if (dist < currbest) {
closest[i] = j;
currbest = dist;
}
}
for (j = i+1; j < numpoints; j++) {
dist = Auxiliaries.distSquared(inputpoints[i],inputpoints[j]);
if (dist < currbest) {
closest[i] = j;
currbest = dist;
}
}
}
tEnd = System.currentTimeMillis();
System.out.println("Time taken in Milliseconds: " + (tEnd - tStart));
}
}
Brute force for nearest neighbour search is only feasible for a small number of points.
You might want to look into kd-Trees or spatial data structures generally.
Here is a demo for kd-Tree. This is what wikipedia says.
I would definitely sort by x first. Then I would use the x distance between points as a quick reject test: once you have the distance to one neighbor, any closer neighbor has to be closer in x. This avoids all the distSquared computations for points outside the x range. Every time you find a closer neighbor, you also tighten up the range of x that you need to search.
Also, if P2 is the closest neighbor to P1, then I would use P1 as the initial guess for the closest neighbor to P2.
EDIT: On second thought, I'd sort by whichever dimension has the largest range.
There are some fairly standard ways of improving this kind of search, and how complicated you want to get depends on how many points you are searching.
A fairly common easy one is to sort the points by X or Y. For each point you then look for near points, going both forwards and backwards in the array. Remember how far away the nearest one you have found is, and when the difference in X (or Y) is greater than that you know there can't be any nearer point left to find.
You can also partition your space using a tree. Wikipedia has a page that gives some possible algorithms. Sometimes the cost to set them up is larger than what you save. That's the sort of thing you have to decide based on how many points you are searching.
Either use a kd-tree, or use a good library for nearest neighbor search. Weka includes one.
Another possibility, simpler than creating a kd-tree, is using a neighborhood matrix.
First place all your points into a 2D square matrix. Then you can run a full or partial spatial sort, so points will became ordered inside the matrix.
Points with small Y could move to the top rows of the matrix, and likewise, points with large Y would go to the bottom rows. The same will happen with points with small X coordinates, that should move to the columns on the left. And symmetrically, points with large X value will go to the right columns.
After you did the spatial sort (there are many ways to achieve this, both by serial or parallel algorithms) you can lookup the nearest points of a given point P by just visiting the adjacent cells where point P is actually stored in the neighborhood matrix.
You can read more details for this idea in the following paper (you will find PDF copies of it online): Supermassive Crowd Simulation on GPU based on Emergent Behavior.
The sorting step gives you interesting choices. You can use just the even-odd transposition sort described in the paper, which is very simple to implement (maybe even in CUDA). If you run just one pass of this, it will give you a partial sort, which can be already useful if your matrix is near-sorted. That is, if your points move slowly, it will save you a lot of computation.
If you need a full sort, you can run such even-odd transposition pass several times (as described in the following Wikipedia page):
http://en.wikipedia.org/wiki/Odd%E2%80%93even_sort
If the changes are small, one or two even-odd passes will suffice to get the array sorted again.
If your points are relatively close together, you can sort by distance from some point (I think it can be any point, but it may have to be a point for which all the points are in the same quadrant if that point is treated as the origin).
Lets say the point of interest is point A and has distance D.
Pick the closest point that is within some relatively small n indexes from the point A in the sorted list (using a larger n provides for a probably better initial guess, but will take longer). If that point has linear distance g from point A, you know that that the closest point has to be at most g from A. This way you only have to consider points in the list with distance between D-g and D+g.
Drawing out a chart might help to understand it. If anybody cares I'll add a diagram.
精彩评论