开发者

Selecting columns with DISTINCT in PostgreSQL

I'm querying bus stops from a database, and I wish to have it only return 1 stop per bus line/direction. This query does just that:

Stop.select("DISTINCT line_id, direction")

Except that it won't give me any other attribute than those 2. I tried a couple of other queries to have it return the id in addition to the line_id and direction fields (ideally it would return all columns), with no luck:

Stop.select("DISTINCT line_id, direction, id")

and

Stop.select("DISTINCT(line_id || '-' || direction), id")

In both cases, the query loses its distinct clause and all rows are returned.

Some awesome dude helped me out and suggested to use a 开发者_运维问答subquery to have it return all the ids:

Stop.find_by_sql("SELECT DISTINCT a1.line_id, a1.direction, (SELECT a2.id from stops a2 where a2.line_id = a1.line_id AND a2.direction = a1.direction ORDER BY a2.id ASC LIMIT 1) as id FROM stops a1

I can then extract all the ids and perform a 2nd query to fetch the full attributes for each stop.

Is there a way to have it all inside 1 query AND have it return all the attributes?


Stop.select("DISTINCT ON (line_id, direction) *")


Not so fast - The other answer selects stop_id arbitrary

This is why your question makes no sense. We can pull stop_ids and have distinct line_id and direction. But we have no idea why we have the stop_id we do.

    create temp table test( line_id integer, direction char(1), stop_id      integer);
    insert into test values
            (1, 'N', 1),
            (1, 'N', 2),
            (1, 'S', 1),
            (1, 'S', 2),
            (2, 'N', 1),
            (2, 'N', 2),
            (2, 'S', 1),
            (2, 'S', 2)
    ;
    select distinct on (line_id, direction) * from test;
    -- do this again but will reverse the order of stop_ids
    -- could it possible change our Robust Query?!!!
    drop table test;
    create temp table test(line_id integer,direction char(1),stop_id integer);
    insert into test values
            (1, 'N', 2),
            (1, 'N', 1),
            (1, 'S', 2),
            (1, 'S', 1),
            (2, 'N', 2),
            (2, 'N', 1),
            (2, 'S', 2),
            (2, 'S', 1)
    ;
    select distinct on (line_id, direction) * from test;

First select:

line_id | direction | stop_id 
---------+-----------+---------
       1 | N         |       1
       1 | S         |       1
       2 | N         |       1
       2 | S         |       1

Second select:

line_id | direction | stop_id 
---------+-----------+---------
       1 | N         |       2
       1 | S         |       2
       2 | N         |       2
       2 | S         |       2

So we got away without grouping stop_id but we have no guarantees why we got the one we did. All we know is that this is valid stop_id. Any updates, inserts, and other stuff that no RDMS will guarantee can be changing around the physical order of rows.

This is what I meant in the top comment. There is no known reason for pulling one stop_id over the other one, but somehow you need this stop_id (or whatever else) desperately.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜