开发者

Slow SQL Query by Limit/Order dynamic field (coordinates from X point)

I'm trying to make a SQL query on a database of 7 million records, the database "geonames" have the "latitude" and "longitude" in decimal(10.7) indexed both, the problem is that the query is too slow:

SELECT SQL_NO_CACHE DISTINCT 
       geonameid, 
       name, 
       (6367.41 * SQRT(2 * (1-Cos(RADIANS(latitude)) * Cos(0.704231626533) * (Sin(RADIANS(longitude))*Sin(-0.0669560660943) + Cos(RADIANS(longitude)) * Cos(-0.0669560660943)) - Sin(RADIANS(latitude)) * Sin(0.704231626533)))) AS Distance 
  FROM geoNames 
 WHERE (6367.41 * SQRT(2 * (1 - Cos(RADIANS(latitude)) * Cos(0.704231626533) * (Sin(RADIANS(longitude)) * Sin(-0.0669560660943) + cos(RADIANS(longitude)) * Cos(-0.0669560660943)) - Sin(RADIANS(latitud开发者_JS百科e)) * Sin(0.704231626533))) <= '10') 
ORDER BY Distance

The problem is sort by the "Distance" field, which when created dynamically take long to seep into the condition "WHERE", if I remove the condition of the "WHERE ... <= 10" takes only 0.34 seconds, but the result is 7 million records and to transfer data from MySQL to PHP takes almost 120 seconds.

Can you think of any way to make the query to not lose performance by limiting the Distance field, given that the query will very often change the values?


This kind of query cannot use an index but must compute whether the lat/lon of each row falls within the specified distance. Therefore, it is typical that some form of preprocessing is used to limit the scan to a subset of rows. You could create tables corresponding to distance "bands" (2, 5, 8, 10, 20 miles/km -- whatever makes sense for your application requirements) and then populate these bands and keep them up to date. If you want only those medical providers, say, or hotels, or whatever, within 10 miles of a given location, there's no need to worry about the ones that are hundreds or thousands of miles away. With ad hoc queries you could inner join on the "within 10 miles" band, say, and thereby exclude from the comparison scan all rows where the computed distance > 10. When the location varies, the "elegant" way to handle this is to implement an RTREE, but you can define your encompassing region in any arbitrary way you like if you have access to additional data -- e.g. by using zipcodes or counties or states.


There are two things you can do:

  • Make sure the datatypes are the same on both sides of a comparison: ie compare with 10 (a number), not '10' (a char type) - it will make less work for the DB
  • In cases like this, I create a view, which means the calculation to be made just once, even if you refer to it more than once in the query

If these two points are incorporated into you code, you get:

CREATE VIEW geoNamesDistance AS
SELECT SQL_NO_CACHE DISTINCT 
       geonameid, 
       name, 
       (6367.41 * SQRT(2 * (1-Cos(RADIANS(latitude)) * Cos(0.704231626533) * (Sin(RADIANS(longitude))*Sin(-0.0669560660943) + Cos(RADIANS(longitude)) * Cos(-0.0669560660943)) - Sin(RADIANS(latitude)) * Sin(0.704231626533)))) AS Distance 
  FROM geoNames;

SELECT * FROM geoNamesDistance
WHERE Distance <= 10
ORDER BY Distance;


I came up with:

select * from retailer
where latitude is not null and longitude is not null
and pow(2*(latitude - ?), 2) + pow(longitude - ?, 2) < your_magic_distance_value

With this fast & easy flat-Earth code, Los Angeles is closer to Honolulu than San Fransisco, but i doubt customers will consider that when going that far to shop.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜