Effectively selecting the closest (distance) record from a database
I've got a database with a 40k venues and growing right now.
Assuming I'm the red dot
I want to be able to retrieve the closest record as quickly as possible.However the distance too the next item could be anything. And there also might be 0-n matches. But do I need to load all 40000 results when I'm just looking for 1?
How can I sort the records by distance? Should it be done in MYSQL, or PHP? This calculation happe开发者_运维问答ns at almost every request, per user, per page, so the solution needs to be quick.
Edit Thanks for the quick and promising answers, I'll need to review these resources, and will accept/comment on answers within a few days.
this problem is covered in this Scribd presentation (theory + math formulas + Mysql): Geo Distance with MySQL
I hope it covers everything you need
The easiest solution is to simple calculate the distance for each record and sort by this value. The problem is: This is very expensive and You can't use an index for that. You can lower the costs by only looking at a subset of your records, perhaps limiting by a bounding box as some posters here suggest.
If you want a clear and fast solution, have a look at the Spatial Extensions of MySQL. These are made exactly for what you want to do. These support:
- A new Column-Type 'Point'
- A special index type optimized for distance-queries
- A distance-operator.
This howto provides some examples:
CREATE TABLE address (
address CHAR(80) NOT NULL,
address_loc POINT NOT NULL,
PRIMARY KEY(address),
SPATIAL KEY(address_loc)
);
CREATE TABLE cab (
cab_id INT AUTO_INCREMENT NOT NULL,
cab_driver CHAR(80) NOT NULL,
cab_loc POINT NOT NULL,
PRIMARY KEY(cab_id),
SPATIAL KEY(cab_loc)
);
SELECT
c.cab_driver,
ROUND(GLength(LineStringFromWKB(LineString(AsBinary(c.cab_loc),
AsBinary(a.address_loc)))))
AS distance
FROM cab c, address a
WHERE a.address = 'Foobar street 110'
ORDER BY distance ASC LIMIT 1;
Create a "bounding box" to use in a WHERE clause in your SQL query as described in this article on Movable Type (with PHP code examples), then include the Haversine formula in your query to calculate the actual distances, and order the result by distance ASC. The nearest venue will then be the first return in the result set.
It's the bounding box that helps your performance, because it means you're only doing the expensive distance calculation on a small subset of your data
If the initial query doesn't return any records, widen the bounding box, and execute the query again until you do get a response.
There is no effective way to find the distance except by trial and error. That is, using MySQL, you can't rank the records by distance from the target then select the top one. The best way is to pick a distance which you think the closest record will be within. Too big a number and you'll get too many records, too small a number and you won't get any. Lets say you pick 40 units.
WHERE xcoord BETWEEN n - 40 AND n + 40 AND ycoord BETWEEN n - 40 AND n + 40
Now you've got all the records with co-ordinates inside an 80 x 80 box, with your target as the centre (the box will be a bit skewed if you're working in latitude and longitude, but that doesn't really matter). Now use the Haversine equation if you're working with latitude and longitude, or Pythagoras if it's just Cartesian, to calculate the distance between the target and each of the points.
精彩评论