Approximate, incremental nearest-neighbour algorithm for moving bodies
Bounty
This question raises several issues. The bounty will go to an answer which addresses them holistically.
Here's a problem I've been playing with.
NOTE I'm especially interested in solutions that are not based in Euclidian space.
There is a set of Actors which form a crowd of size K. The distance d(ActorA,ActorB)
is easily computable for any two actors (solutions should work for various definitions of 'distance') and we can find the set of N nearest neighbours for any given Actor using any of a number of established algorithms.
This neighbour set is correct at the first instant but the Actors are always moving and I want to maintain the evolving list of N nearest neighbours for each Actor. What I am interested in is approximate solutions which are more efficient than perfect solutions.
- Solutions should converge to correctness after errors have been introduced.
- It is acceptable to sometimes perform a full recomputation if the errors become too large but detecting these errors should be cheap.
So far I have been using a friend-of-a-friend algorithm:
recompute_nearest (Actor A)
{
Actor f_o_f [N*N];
for each Actor n in A.neighbours
append n to f_o_f if n != A and n not in f_o_f
Distance distances [N*N];
for 0 <= i < f_o_f.size
distances [i] = distance (A, f_o_f [i])
sort (f_o_f, distances)
A .neighbours = first N from f_o_f
}
This performs reasonably well when the crowd is slow-moving and N is suitably large. It converges after small errors, satisfying the first criteria, but
- I don't have good way to detect large errors,
- I have no quantitative description of the size and frequency of errors,
- it converges in practice but I can't prove that it always will.
Can you help with any of these points?
Also, do you know of any alternative approaches which perform well
- when the crowd is fast-moving,
- when some actors are fast-moving,
- when N is small,
- when the crowd is sparse in some places and dense in others,
- or with particular spacial-indexing algorithms?
The extension I'm working on at the moment is to generalise friend-of-a-friend to take the friend-of-a-开发者_开发知识库friend-of-a-friend in cases when a neighbour is fast-moving. I suspect that this doesn't scale to well and it's hard to derive the right parameters without a quantification of the errors.
I welcome all suggestions! It's a fun little problem :-)
Notable Suggestions So Far
Fexvez: sample random extra neighbours, sample size depending on the speed of the Agent. Sampling from the area it's about to move into would probably also help.
Resample the neighbours when an agents speed*delta_time
exceeds the distance to the furthest known neighbour.
Maintain the Delaunay triangulation which is a superset of the nearest-neighbour graph. Only accounts for one nearest neighbour.
David Mount's ANN library Doesn't seem to handle moving bodies.
A very simple and very fast method from statistics is to use random linear projections. These can help you determine clusters and neighbors very quickly. With more projections, you get more accuracy (addressing your question about errors, I believe).
This paper offers an extensive quantitative analysis of several methods, including a new method (DPES) that is related to RLP.
This paper addresses use of RLP including for distance preservation even in the context of moving points.
This paper addresses RLP for motion planning and details several heuristics.
RLP methods are:
- Very fast
- Lead to approximations that are tunable for accuracy and speed
- Distance and angle preserving (provable)
- Easily scaled to large dimensions and large #s of objects
- Useful for dimensionality reduction
- Lead to compact projections (e.g. one could project into a hierarchical binary partitioning)
- Flexible: you can project into whatever space you feel would be good for you - usually R^d, but projecting into 2^d (i.e. binary space of dimension d) is also allowable, only subject to a reduction in accuracy for a given # of projections.
- Statistically interesting
After embedding into a lower dimensional space, neighbor calculations are very easy, as projections that are, say, binned in the same regions (if you bin the projections into a grid) are very likely to be close in the original space.
Although the dimensionality of the original data is small (even 10 is small), the ability to rapidly project into a pre-selected grid is very useful for identifying and counting neighbors.
Finally, you only need to update for those objects whose location (or relative location, if you're centering and scaling the data) have changed.
For related works, look into the Johnson-Lindenstrauss lemma.
Have you tried using a quad tree?
That is, recursively partition the "world" into a graph with four subnodes each. The tree can then quickly check which objects are inside a particular square of the world and discard the rest. A very effective culling technique often used for improving performance of collision detection in games.
If this is 3D, extend it to an octree.
Here's a simple counterexample showing that your friend-of-a-friend algorithm will sometimes miss neighbors: start with a long straight line of many (at least more than 3*N) equidistantly spaced actors, and gradually bend the line into a circle. If the line is long enough, the actors in it will detect no local change in their neighborhoods; in particular, the actors at the ends won't notice when they meet.
Edit: Actually, I thought of an even simpler counterexample: take two isolated compact clusters of N or more actors each, and send them on a collision course with each other.
I've got something that somewhat looks like a solution.
At each "recompute" step, it takes a linear time, which I guess is not that much a problem as you do recompute_nearest (Actor A)
for each actor.
Let's get to the point : the idea is for each actor, add a certain number of random friends just before computing recompute_nearest (Actor A)
. This number should not be to small, otherwise you'll never detect errors. It shouldn't be too big, else as computing neighbours of neighbours is quadratic, it will be computationally too expensive.
But what does "not too small" or "not too big" mean? We'll start with one actor A and see how his neighbour list will be updated. Let's suppose that for each point, you add p
percent of the original K actors. Then, if another point B comes close to A but is not a friend of a friend, we should "quickly" add him via the "random friends" thing. At each iteration, there is a (1-p)
chance to not choose him.
A quick computation shows that if you want to spot this point in 10 iterations or less in 90% of the case, you'll need p=20%
. This is horribly expensive.
...
However, all is not lost! The first point I want to make is that if errors tend to "go together" then the performance dramatically increases! Imagine two blobs that are initially far apart (people chatting, star clusters, etc...) If these blobs collide, the friend-of-friend will never see that there is a problem. But, with my algorithm, if you spot one single error, and correctly adapt your neighbour list, then, the friend-of-friend algorithm will correct all of the remaining errors.
If you have K=1.000
and two small blobs containing each only 1% of the points and you want to spot in 10 iterations or less with 99% certainty, you can compute that you'll only need p=1%
! (The bigger K is, the smaller p needs to be!) Well, that supposes that some points "go together".
I've another optimisation : adapt the number and quality of random points. Simply put, fast actors should be given more random friends than slow actors. And you should randomized these friends prefering other fast actors. I can't quantize it, but you can guess why it'll improve the performance!
A small edit concerning your question : "What do I do when the actors are fast"? You can translate the speed of actors with the number of steps you give yourself to spot an error. What I mean is that if you can consider that each actor has a circle around him of his friends. That theoretical circle corresponds to his friend list. If most of the actors can't completely cross this circle in 10 iterations, but can in 15, then you should consider that you give yourself 10 iterations to spot the problem. If your actors are so fast that they can cross this "circle" in 3 steps, then, well... this algorithm is not for you (and I guess it's illusory to try to keep a neighbour list without recomputing it at each step)
Fundamentally, keeping a neighbour list suggests that there is some structure (i.e something roughly the same between two iterations). Speed is the contrary, fast actors mean that you have no structure. In the case of very fast actors, keeping neighbours list is probably a bad idea.
Technical detail about the blobs.
You have two blobs that contain each s%
of the actors (size sK
) (example is s = 1%
)
You give yourself an error margin of e%
(example is 1%
) and n
steps (example is 10)
Then there is an upper bound for p : p <= [log(1-e^(1/n))]/[s*K²*log(1-s)]
The true value of my example is p <= 0.989%
!
You've got p = [log(1-e^(1/n))]/[s*K²*log(1-s)]
if s is small and K is big. If it's not the case, p is much smaller!
What I would try.
Cover trees as the base data structure for doing kNN. Features: doesn't need a Euclidean metric, linear space usage, kNN/Insert/Remove are all O(log n) when the intrinsic dimensionality of the data is fixed. Non-feature: motion.
To handle moving objects, periodically, for each object, remove its old position, insert its new position, and find the kNN.
If we set the period of an object inversely proportional to speed, then we know that the maximum discrepancy between the cover tree and reality is bounded by a constant.
KD Trees allow you to efficiently search within a fixed radius of a point. If all of a point's neighbour's are within d1 units and maximum displacement of any point from the previous measurement is d2, then you only need to search within d1+d2 units of the point.
If you were dealing with a Minkowski distance, then you could have used David Mount's ANN library. I am not sure, if there is a library that would take arbitrary distance functions, though.
When the distance that an Actor moves in a given timestep exceeds the distance to its furthest known neighbour, resample all neighbours.
Another trigger: when the distance an Actor has moved since the last resample exceeds the distance to the furthest known neighbour, resample.
Trivial optimisation goes well here, if there is a hierarchial spatial index, search for a node containing both a) a sphere of radius equal to the jump size, and b) at least N Actors.
精彩评论