pl/sql code for nearest neighbor query without index for a table with point data in oracle
i am trying to build a procedure to obtain k nearest neighbor points to a point with a selected ID. I need to do this without using any spatial locator features like sdo_geometry or nn.
basically i have a table in oracle with ID, Data_X, Data_Y. let's say i have 10 entries in my table, and i need the 3 nearest points to a fictional point target_x, target_y.
we would need to calculate the euclidean distance of each point in the table with my given fictional point. I just dont know an algorithm开发者_运维知识库 in pl/sql that would return me the nearest neighbor ids.
Calculate the distance (Pythagoras) between each point and the selected point and order by the distance. Pseudo sql:
select id from points
order by sqrt(sqr(Data_x - target_x) + sqr(Data_y - target_y))
The first 3 rows are the the nearest 3 points.
nang's answer is a great starting point, and if it does the job, I'd use it. Unfortunately it will most probably require a full table scan (or perhaps a full index scan, if you have a covering index).
If performance becomes a problem, you could probably have a look at making a poor-man's spatial index over the data. It won't be as simple as "create index" but it just might work.
The proper method would be to create a custom index, but that would just be reinventing the sdo_geometry wheel, a path you said you wish to avoid.
A simple-but-rough method (disclaimer: this is just an idea off the top of my head, not tested) might be to create a function-based index that groups all the points in 2D space into square blocks. You basically create an index to map each (x,y) pair onto a list of blocks. Each block would have a defined width and height, and to do a search you would first work out which grid of blocks need to be searched, then query across just the list of points in that grid.
An example index would be something like:
CREATE INDEX grid_block_i ON points (TRUNC(Data_X/100), TRUNC(Data_Y/100), id);
What value you substitute for 100 will depend on the range of values your points take. You will want to divide the plane into a large number of grid blocks, so that the index is reasonably selective; but not so large that a typical query will have to search too many blocks to find candidates.
You could use the index above by using a query like this:
select id
from (select id, Data_X, Data_Y
from points
where TRUNC(Data_X/100) BETWEEN TRUNC(:target_x/100)) - :threshold
AND TRUNC(:target_x/100)) + :threshold
and TRUNC(Data_Y/100) BETWEEN TRUNC(:target_y/100)) - :threshold
AND TRUNC(:target_y/100)) + :threshold
)
order by sqrt(sqr(Data_x - :target_x) + sqr(Data_y - :target_y))
You can then set :threshold to basically eliminate a large set of blocks of points from the query. I reckon if the values for the functional index (i.e. 100) and the threshold are set correctly, you'll see the query use the function-based index to get a small set of candidates, instead of calculating the distance for every single point in the table.
The downside is that if :threshold is too low, the query might return no rows. On the other hand, that might be a useful feature, depending on your needs.
精彩评论