开发者

How to group results from a query based on one-to-many association by some criterion in the "many"?

Please forgive the awkward title. I had a hard time distilling my question into one phrase. If anyone can come up with a better one, feel free.

I have the following simplified schema:

vendors
  INT id

locations
  INT id
  INT vendor_id
  FLOAT latitude
  FLOAT longitude

I am perfectly capable of return a list of the nearest vendors, sorted by proximity, limited by an approximation of radius:

SELECT * FROM locations
WHERE latitude IS NOT NULL AND longitude IS NOT NULL
  AND ABS(latitude - 30) + ABS(longitude - 30) < 50
ORDER BY ABS(latitude - 30) + ABS(longitude - 30) ASC

I can't at this moment find my way around the repetition of the order/limit term. I initially attempted aliasing it as "distance" among the SELECT fields, but psql told me that this alias wasn't available in the WHERE clause. Fine. If there's some fancy pants way around this, I'm all ears, but on to my main question:

What I'd like to do is to return a list of vendors, each joined with the closest of its locations, and have this list ordered by proximity and limited by radius.

So supposing I have 2 vendors, each with two locations. I want a query that limits the radius such that only one of the four locations is within it to return that location's associated vendor alongside the vendor itself. If the radius encompassed all the locations, I'd want vendor 1 presented with the closest between its locations and vendor 2 with the closest between its locations, ultimately ordering vendors 1 and 2 based on the proximity of their closest location.

In MySQL, I managed to get the closest location in each vendor's row by using GROUP BY and then MIN(distance). But PostgreSQL seems to be stricter on the usage of GROUP BY.

I'd like to, if possible, avoid meddling with the SELECT clause. I'd also like to, if possible reuse the WHERE and ORDER parts of the above query. But these are by no means absolute requirements.

I have made hackneyed attempts at DISTINCT ON and GROUP BY, but these gave me a fair bit of trouble, mostly in terms of me missing mirrored statements elsewhere, which I won't elaborate in great detail on now.


Solution

I ended up adopting a solution based off OMG Ponies' excellent answer.

SELECT vendors.* FROM (
  SELECT locations.*, 
    ABS(locations.latitude - 2.1) + ABS(locations.longitude - 2.1) AS distance,
    ROW_NUMBER() OVER(PARTITION BY locations.locatable_id, locations.locatable_type
      ORDER BY ABS(locations.latitude - 2.1) + ABS(locations.longitude - 2.1) ASC) AS rank
    FROM locations
    WHERE locations.latitude IS NOT NULL
    AND locations.longitude IS NOT NULL
    AND locations.locatable_type = 'Vendor'
  ) ranked_locations
INNER JOIN vendors ON vendors.id = ranked_locations.locatable_id
WHERE (ranked_locations.rank = 1)
  AND (ranked_locations.distance <= 0.5)
ORDER BY ranked_locations.distance;

Some deviations from OMG Ponies' solution:

  • Locations are now polymorphically associated via _type. A bit of a premise change.
  • I moved the join outside the subquery. I don't know if there are performance implications, but it made sense in my mind to see the subquery as a getting of locations and partitioned rankings and then the larger query as an act of bri开发者_C百科nging it all together.
  • minor Took away table name aliasing. Although I'm plenty used to aliasing, it just made it harder for me to follow along. I'll wait until I'm more experienced with PostgreSQL before working in that flair.


For PostgreSQL 8.4+, you can use analytics like ROW_NUMBER:

SELECT x.*
  FROM (SELECT v.*,
               t.*,
               ABS(t.latitude - 30) + ABS(t.longitude - 30) AS distance,
               ROW_NUMBER() OVER(PARTITION BY v.id
                                     ORDER BY ABS(t.latitude - 30) + ABS(t.longitude - 30)) AS rank
          FROM VENDORS v
          JOIN LOCATIONS t ON t.vendor_id = v.id
         WHERE t.latitude IS NOT NULL 
           AND t.longitude IS NOT NULL) x
  WHERE x.rank = 1
    AND x.distance < 50
ORDER BY x.distance

I left the filtering on distance, in case the top ranked value was over 50 so the vendor would not appear. Remove the distance check being less than 50 portion if you don't want this to happen.

ROW_NUMBER will return a distinct sequential value that resets for every vendor in this example. If you want duplicates, you'd need to look at using DENSE_RANK.

See this article for emulating ROW_NUMBER on PostgreSQL pre-8.4.


MySQL extends GROUP BY and not all columns are required to be aggregates. http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-columns.html

I have seen many questions here with the same issue. The trick is to get the nececssary columns in a subquery and then self join it in the outer query:

create temp table locations (id int, vender_id int, latitude int, longitude int);
CREATE TABLE
insert into locations values
        (1, 1, 50, 50),
        (2, 1, 35, 30),
        (3, 2, 5, 30)
;
SELECT
     locations.*, distance
     FROM
     (
          SELECT 
              vender_id,
              MIN(ABS(latitude - 30) + ABS(longitude - 30)) as distance
              FROM locations
              WHERE latitude IS NOT NULL AND longitude IS NOT NULL
                  GROUP BY vender_id
      ) AS min_locations
      JOIN locations ON
           ABS(latitude - 30) + ABS(longitude - 30) = distance
           AND min_locations.vender_id = locations.vender_id
       WHERE distance < 50
       ORDER BY distance
;
 id | vender_id | latitude | longitude | distance 
----+-----------+----------+-----------+----------
  2 |         1 |       35 |        30 |        5
  3 |         2 |        5 |        30 |       25
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜