Distance between two coordinates, how can I simplify this and/or use a different technique?

2023-02-05 08:27 问答作者：

I need to write a query which allows me to find all locations within a range (Miles) from a provided location.

The table is like this:

id  |  name  |  lat  |  lng

So I have been doing research and found: this my sql presentation

I have tested it on a table with around 100 rows and will have plenty more! - Must be scalable.

I tried something more simple like this first:

//just some test data this would be required by user input    
set @orig_lat=55.857807; set @orig_lng=-4.242511; set @dist=10;

SELECT *, 3956 * 2 * ASIN(
          SQRT( POWER(SIN((orig.lat - abs(dest.lat)) * pi()/180 / 2), 2) 
   开发者_如何学运维           + COS(orig.lat * pi()/180 ) * COS(abs(dest.lat) * pi()/180)  
              * POWER(SIN((orig.lng - dest.lng) * pi()/180 / 2), 2) )) 
          AS distance
  FROM locations dest, locations orig
 WHERE orig.id = '1'
HAVING distance < 1
 ORDER BY distance;

This returned rows in around 50ms which is pretty good! However this would slow down dramatically as the rows increase.

EXPLAIN shows it's only using the PRIMARY key which is obvious.

Then after reading the article linked above. I tried something like this:

// defining variables - this when made into a stored procedure will call
// the values with a SELECT query.
set @mylon = -4.242511;
set @mylat = 55.857807;
set @dist = 0.5;

-- calculate lon and lat for the rectangle:
set @lon1 = @mylon-@dist/abs(cos(radians(@mylat))*69);
set @lon2 = @mylon+@dist/abs(cos(radians(@mylat))*69);
set @lat1 = @mylat-(@dist/69); 
set @lat2 = @mylat+(@dist/69);

-- run the query:

SELECT *, 3956 * 2 * ASIN(
          SQRT( POWER(SIN((@mylat - abs(dest.lat)) * pi()/180 / 2) ,2)
              + COS(@mylat * pi()/180 ) * COS(abs(dest.lat) * pi()/180)
              * POWER(SIN((@mylon - dest.lng) * pi()/180 / 2), 2) ))
          AS distance
  FROM locations dest
 WHERE dest.lng BETWEEN @lon1 AND @lon2
   AND dest.lat BETWEEN @lat1 AND @lat2
HAVING distance < @dist
 ORDER BY distance;

The time of this query is around 240ms, this is not too bad, but is slower than the last. But I can imagine at much higher number of rows this would work out faster. However anEXPLAIN shows the possible keys as lat,lng or PRIMARY and used PRIMARY.

How can I do this better???

I know I could store the lat lng as a POINT(); but I also haven't found too much documentation on this which shows if it's faster or accurate?

Any other ideas would be happily accepted!

Thanks very much!

-Stefan

UPDATE:

As Jonathan Leffler pointed out I had made a few mistakes which I hadn't noticed:

I had only put abs() on one of the lat values. I was using an id search in the WHERE clause in the second one as well, when there was no need. In the first query was purely experimental the second one is more likely to hit production.

After these changes EXPLAIN shows the key is now using lng column and average time to respond around 180ms now which is an improvement.

Any other ideas would be happily accepted!

If you want speed (and simplicity) you'll want some decent geospatial support from your database. This introduces geospatial datatypes, geospatial indexes and (a lot of) functions for processing / building / analyzing geospatial data.

MySQL implements a part of the OpenGIS specifications although it is / was (last time I checked it was) very very rough around the edges / premature (not useful for any real work).

PostGis on PostgreSql would make this trivially easy and readable:

(this finds all points from tableb which are closer then 1000 meters from point a in tablea with id 123)

select 
    myvalue
from 
    tablea, tableb
where 
    st_dwithin(tablea.the_geom, tableb.the_geom, 1000)
and
    tablea.id = 123

The first query ignores the parameters you set - using 1 instead of @dist for the distance, and using the table alias orig instead of the parameters @orig_lat and @orig_lon.

You then have the query doing a Cartesian product between the table and itself, which is seldom a good idea if you can avoid it. You get away with it because of the filter condition orig.id = 1, which means that there's only one row from orig joined with each of the rows in dest (including the point with dest.id = 1; you should probably have a condition AND orig.id != dest.id). You also have a HAVING clause but no GROUP BY clause, which is indicative of problems. The HAVING clause is not relating any aggregates, but a HAVING clause is (primarily) for comparing aggregate values.

Unless my memory is failing me, COS(ABS(x)) === COS(x), so you might be able to simplify things by dropping the ABS(). Failing that, it is not clear why one latitude needs the ABS and the other does not - symmetry is crucial in matters of spherical trigonometry.

You have a dose of the magic numbers - the value 69 is presumably number of miles in a degree (of longitude, at the equator), and 3956 is the radius of the earth.

I'm suspicious of the box calculated if the given position is close to a pole. In the extreme case, you might need to allow any longitude at all.

The condition dest.id = 1 in the second query is odd; I believe it should be omitted, but its presence should speed things up, because only one row matches that condition. So the extra time taken is puzzling. But using the primary key index is appropriate as written.

You should move the condition in the HAVING clause into the WHERE clause.

But I'm not sure this is really helping...

The NGS Online Inverse Geodesic Calculator is the traditional reference means to calculate the distance between any two locations on the earth ellipsoid:

http://www.ngs.noaa.gov/cgi-bin/Inv_Fwd/inverse2.prl

But above calculator is still problematic. Especially between two near-antipodal locations, the computed distance can show an error of some tens of kilometres !!! The origin of the numeric trouble was identified long time ago by Thaddeus Vincenty (page 92):

http://www.ngs.noaa.gov/PUBS_LIB/inverse.pdf

In any case, it is preferrable to use the reliable and very accurate online calculator by Charles Karney:

http://geographiclib.sourceforge.net/cgi-bin/Geod

Some thoughts on improving performance. It wouldn't simplify things from a maintainability standpoint (makes things more complex), but it could help with scalability.

Since you know the radius, you can add conditions for the bounding box, which may allow the db to optimize the query to eliminate some rows without having to do the trig calcs.
You could pre-calculate some of the trig values of the lat/lon of stored locations and store them in the table. This would shift some of the performance cost when inserting the record, but if queries outnumber inserts, this would be good. See this answer for an idea of this approach:

Query to get records based on Radius in SQLite?
You could look at something like geohashing.

When used in a database, the structure of geohashed data has two advantages. ,,, Second, this index structure can be used for a quick-and-dirty proximity search - the closest points are often among the closest geohashes.

You could search SO for some ideas on how to implement: https://stackoverflow.com/search?q=geohash

If you're only interested in rather small distances, you can approximate the geographical grid by a rectangular grid.

SELECT *, SQRT(POWER(RADIANS(@mylat - dest.lat), 2) +
               POWER(RADIANS(@mylon - dst.lng)*COS(RADIANS(@mylat)), 2)
              )*@radiusOfEarth AS approximateDistance
…

You could make this even more efficient by storing radians instead of (or in addition to) degrees in your database. If your queries may cross the 180° meridian, some extra care would be neccessary there, but many applications don't have to deal with those locations. You could also try to change POWER(x) to x*x, which might get computed faster.

继续阅读：coordinates distance geospatial sql

Distance between two coordinates, how can I simplify this and/or use a different technique?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？