Fastest way (and most optimized) to calculate and sort users by ascending order of zip code distance
I have a system which will return all users from the database and order the results by lowest distance from a reference zip code.
For example: User will come on the site, enter zip code and it will return him all other users who are nearest to his zip (ascending order)
How am i doing this now and why is it a problem ?
The system contains more than 30 million users and their zipcodes. I am retreiving all the users in a particular state and city (narrows the dataset down to about 10,000).
This is where the problem actually happens. Now, all the result sent by mysql (10,000) rows to PHP are sent to a zipcode calculator library which calculates this distance between the base zip code and user's zip code - 10,000 times. Then orders the result by the zip code nearest.
As you can see, this is very badly optimized code. And the 10,000 records are looped through twice. Not to mention the amount of RAM each httpd process takes just transferring data to and fro mysql.
What I would like to ask the gurus in here that is there anyway to optimize this ?
I have a few ideas of my own, but i'm not sure how efficient they are.
Try to do all the zipcode calculation and ordering in mysql itself and return the paginated number of rows. For this, i will need to move the distance between zipcode calculation logic to a stored procedure. This way I am preventing the processing of 10,000 records in PHP. However, there is still a problem. I would not need to calculate distance for zip codes which have already been cal开发者_Go百科culated (for 2 users having the same zip code).
Secondly, how do i order rows in mysql using a stored procedure ?
What do you guys think ? Is this a good way ? Can i expect a performance boost using this ? Do you have any other suggestions ?
I know this question is huge, and i really appreciate the time you have taken to read till the end. I would really like to hear your thoughts about this.
As I'm not overly familiar with PHP or MySQL, I can only give some basic tips but they should help. This also assumes you have no direct way of interfacing with the zip library from MySQL.
First, as it's doubtful that you have 10k zip codes in a city, take your existing query and do something like
SELECT DISTINCT ZipCode FROM Users WHERE ...
This will probably return a few dozen zip codes max, and no duplicates. Run this through your zip code library. That library itself is probably a source of slowness, as it has to look up the zip codes, and do a bunch of fancy trig to get actual distance. Take the results of this, and insert it into a temp table with just the zip code and the distance.
Once done with that list, have another query that gets the rest of the user data you want, and JOIN into the the temp table on zip code to get your distance.
This should give you quite a large speedup. You can do whatever paging you need in the second query after the results have been calculated. And no more looping through 10k rows.
I suggest that you narrow the latitude and longitude ranges before you compute the accurate distance for filtering and sorting purposes.
What I mean is if you do a full table scan and compute distances for all zip codes in the database relative to your reference point, it will be very slow.
Instead, filter zipcode by proximity. I mean if you have latitude 10 and longitude 20, first compute the maximum angular range for the proximity you want. Lets say you want a proximity range of 10 miles. That may translate into 0.15 degrees. So you need to filter you zip codes first latitude between 10-0.15 and 10+0.15 and longitude between 20-0.15 and 20+0.15 .
Only after that you include the accurate distance clause in your SQL query condition. That will be much faster because you no longer do full scan and you can eventually use range indexes on longitude and latitude fields.
To translate miles into degrees find the narrow range, keep in mind that the Earth has , approximately 25,000 miles of perimeter, divide 25000 by 360 degrees which gives 70 miles per degree. If you want a range of 10 miles, your range in degrees will be at most 0.15 degrees.
Keep in mind that these calculations are not accurate (the Earth is not exactly well rounded) but that is not important. What is important is that you find a degree range value that is higher than the really accurate value.
If you can get the latitude and longitude for all zipcodes into MySQL, or have an easy way of fetching the lat/long for your base zipcode and feeding it into your MySQL query, then you can order your 10k users by distance inside MySQL. There is a very similar question and answer here which gives you the correct math for the distance function. You may also want to investigate Mysql spatial extensions which would let you insert and index your lat/longs as 2D POINT data.
精彩评论