Slow SQL Query by Limit/Order dynamic field (coordinates from X point)

2023-03-14 08:20 问答作者：

I'm trying to make a SQL query on a database of 7 million records, the database "geonames" have the "latitude" and "longitude" in decimal(10.7) indexed both, the problem is that the query is too slow:

SELECT SQL_NO_CACHE DISTINCT 
       geonameid, 
       name, 
       (6367.41 * SQRT(2 * (1-Cos(RADIANS(latitude)) * Cos(0.704231626533) * (Sin(RADIANS(longitude))*Sin(-0.0669560660943) + Cos(RADIANS(longitude)) * Cos(-0.0669560660943)) - Sin(RADIANS(latitude)) * Sin(0.704231626533)))) AS Distance 
  FROM geoNames 
 WHERE (6367.41 * SQRT(2 * (1 - Cos(RADIANS(latitude)) * Cos(0.704231626533) * (Sin(RADIANS(longitude)) * Sin(-0.0669560660943) + cos(RADIANS(longitude)) * Cos(-0.0669560660943)) - Sin(RADIANS(latitud开发者_JS百科e)) * Sin(0.704231626533))) <= '10') 
ORDER BY Distance

The problem is sort by the "Distance" field, which when created dynamically take long to seep into the condition "WHERE", if I remove the condition of the "WHERE ... <= 10" takes only 0.34 seconds, but the result is 7 million records and to transfer data from MySQL to PHP takes almost 120 seconds.

Can you think of any way to make the query to not lose performance by limiting the Distance field, given that the query will very often change the values?

This kind of query cannot use an index but must compute whether the lat/lon of each row falls within the specified distance. Therefore, it is typical that some form of preprocessing is used to limit the scan to a subset of rows. You could create tables corresponding to distance "bands" (2, 5, 8, 10, 20 miles/km -- whatever makes sense for your application requirements) and then populate these bands and keep them up to date. If you want only those medical providers, say, or hotels, or whatever, within 10 miles of a given location, there's no need to worry about the ones that are hundreds or thousands of miles away. With ad hoc queries you could inner join on the "within 10 miles" band, say, and thereby exclude from the comparison scan all rows where the computed distance > 10. When the location varies, the "elegant" way to handle this is to implement an RTREE, but you can define your encompassing region in any arbitrary way you like if you have access to additional data -- e.g. by using zipcodes or counties or states.

There are two things you can do:

Make sure the datatypes are the same on both sides of a comparison: ie compare with 10 (a number), not '10' (a char type) - it will make less work for the DB
In cases like this, I create a view, which means the calculation to be made just once, even if you refer to it more than once in the query

If these two points are incorporated into you code, you get:

CREATE VIEW geoNamesDistance AS
SELECT SQL_NO_CACHE DISTINCT 
       geonameid, 
       name, 
       (6367.41 * SQRT(2 * (1-Cos(RADIANS(latitude)) * Cos(0.704231626533) * (Sin(RADIANS(longitude))*Sin(-0.0669560660943) + Cos(RADIANS(longitude)) * Cos(-0.0669560660943)) - Sin(RADIANS(latitude)) * Sin(0.704231626533)))) AS Distance 
  FROM geoNames;

SELECT * FROM geoNamesDistance
WHERE Distance <= 10
ORDER BY Distance;

I came up with:

select * from retailer
where latitude is not null and longitude is not null
and pow(2*(latitude - ?), 2) + pow(longitude - ?, 2) < your_magic_distance_value

With this fast & easy flat-Earth code, Los Angeles is closer to Honolulu than San Fransisco, but i doubt customers will consider that when going that far to shop.

继续阅读：sql

Slow SQL Query by Limit/Order dynamic field (coordinates from X point)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？