How can I extend this SQL query to find the k nearest neighbors?
I have a database full of two-dimensional data - points on a map. Each record has a field of the geometry type. What I need to be able to do is pass a point to a stored procedure which returns the k nearest points (k would also be passed to the sproc, but that's easy). I've found a query at http://blogs.msdn.com/isaac/archive/2008/10/23/nearest-neighbors.aspx which gets the single nearest neighbour, but I can't figure how to extend it to find the k nearest neighbours.
This is the current query - T
is the table, g
is the geometry field, @x
is the point to search around, Numbers
is a table with integers 1 to n:
DECLARE @start FLOAT = 1000;
WITH NearestPoints AS
(
SELECT TOP(1) WITH TIES *, T.g.STDistance(@x) AS dist
FROM Numbers JOIN T WITH(INDEX(spatial_index))
ON T.g.STDistance(@x) < @start*POWER(2,Numbers.n)
ORDER BY n
)
SELECT TOP(1) * FROM NearestPoints
ORDER BY n, dist
The inner query selects the nearest non-empty region and the outer query then selects the top result from that region; the outer query can easily be changed to (e.g.) SELECT TOP(20)
, but if the nearest region only contains one re开发者_如何学运维sult, you're stuck with that.
I figure I probably need to recursively search for the first region containing k records, but without using a table variable (which would cause maintenance problems as you have to create the table structure and it's liable to change - there're lots of fields), I can't see how.
What happens if you remove TOP (1) WITH TIES
from the inner query, and set the outer query to return the top k rows?
I'd also be interested to know whether this amendment helps at all. It ought to be more efficient than using TOP
:
DECLARE @start FLOAT = 1000
,@k INT = 20
,@p FLOAT = 2;
WITH NearestPoints AS
(
SELECT *
,T.g.STDistance(@x) AS dist
,ROW_NUMBER() OVER (ORDER BY T.g.STDistance(@x)) AS rn
FROM Numbers
JOIN T WITH(INDEX(spatial_index))
ON T.g.STDistance(@x) < @start*POWER(@p,Numbers.n)
AND (Numbers.n - 1 = 0
OR T.g.STDistance(@x) >= @start*POWER(@p,Numbers.n - 1)
)
)
SELECT *
FROM NearestPoints
WHERE rn <= @k;
NB - untested - I don't have access to SQL 2008 here.
Quoted from Inside Microsoft® SQL Server® 2008: T-SQL Programming. Section 14.8.4.
The following query will return the 10 points of interest nearest to @input:
DECLARE @input GEOGRAPHY = 'POINT (-147 61)';
DECLARE @start FLOAT = 1000;
WITH NearestNeighbor AS(
SELECT TOP 10 WITH TIES
*, b.GEOG.STDistance(@input) AS dist
FROM Nums n JOIN GeoNames b WITH(INDEX(geog_hhhh_16_sidx)) -- index hint
ON b.GEOG.STDistance(@input) < @start*POWER(CAST(2 AS FLOAT),n.n)
AND b.GEOG.STDistance(@input) >=
CASE WHEN n = 1 THEN 0 ELSE @start*POWER(CAST(2 AS FLOAT),n.n-1) END
WHERE n <= 20
ORDER BY n
)
SELECT TOP 10 geonameid, name, feature_code, admin1_code, dist
FROM NearestNeighbor
ORDER BY n, dist;
Note: Only part of this query’s WHERE clause is supported by the spatial index. However, the query optimizer correctly evaluates the supported part (the "<" comparison) using the index. This restricts the number of rows for which the ">=" part must be tested, and the query performs well. Changing the value of @start can sometimes speed up the query if it is slower than desired.
Listing 2-1. Creating and Populating Auxiliary Table of Numbers
SET NOCOUNT ON;
USE InsideTSQL2008;
IF OBJECT_ID('dbo.Nums', 'U') IS NOT NULL DROP TABLE dbo.Nums;
CREATE TABLE dbo.Nums(n INT NOT NULL PRIMARY KEY);
DECLARE @max AS INT, @rc AS INT;
SET @max = 1000000;
SET @rc = 1;
INSERT INTO Nums VALUES(1);
WHILE @rc * 2 <= @max
BEGIN
INSERT INTO dbo.Nums SELECT n + @rc FROM dbo.Nums;
SET @rc = @rc * 2;
END
INSERT INTO dbo.Nums
SELECT n + @rc FROM dbo.Nums WHERE n + @rc <= @max;
精彩评论