Selecting columns with DISTINCT in PostgreSQL
I'm querying bus stops from a database, and I wish to have it only return 1 stop per bus line/direction. This query does just that:
Stop.select("DISTINCT line_id, direction")
Except that it won't give me any other attribute than those 2. I tried a couple of other queries to have it return the id
in addition to the line_id
and direction
fields (ideally it would return all columns), with no luck:
Stop.select("DISTINCT line_id, direction, id")
and
Stop.select("DISTINCT(line_id || '-' || direction), id")
In both cases, the query loses its distinct clause and all rows are returned.
Some awesome dude helped me out and suggested to use a 开发者_运维问答subquery to have it return all the ids:
Stop.find_by_sql("SELECT DISTINCT a1.line_id, a1.direction, (SELECT a2.id from stops a2 where a2.line_id = a1.line_id AND a2.direction = a1.direction ORDER BY a2.id ASC LIMIT 1) as id FROM stops a1
I can then extract all the ids and perform a 2nd query to fetch the full attributes for each stop.
Is there a way to have it all inside 1 query AND have it return all the attributes?
Stop.select("DISTINCT ON (line_id, direction) *")
Not so fast - The other answer selects stop_id arbitrary
This is why your question makes no sense. We can pull stop_ids and have distinct line_id and direction. But we have no idea why we have the stop_id we do.
create temp table test( line_id integer, direction char(1), stop_id integer);
insert into test values
(1, 'N', 1),
(1, 'N', 2),
(1, 'S', 1),
(1, 'S', 2),
(2, 'N', 1),
(2, 'N', 2),
(2, 'S', 1),
(2, 'S', 2)
;
select distinct on (line_id, direction) * from test;
-- do this again but will reverse the order of stop_ids
-- could it possible change our Robust Query?!!!
drop table test;
create temp table test(line_id integer,direction char(1),stop_id integer);
insert into test values
(1, 'N', 2),
(1, 'N', 1),
(1, 'S', 2),
(1, 'S', 1),
(2, 'N', 2),
(2, 'N', 1),
(2, 'S', 2),
(2, 'S', 1)
;
select distinct on (line_id, direction) * from test;
First select:
line_id | direction | stop_id
---------+-----------+---------
1 | N | 1
1 | S | 1
2 | N | 1
2 | S | 1
Second select:
line_id | direction | stop_id
---------+-----------+---------
1 | N | 2
1 | S | 2
2 | N | 2
2 | S | 2
So we got away without grouping stop_id but we have no guarantees why we got the one we did. All we know is that this is valid stop_id. Any updates, inserts, and other stuff that no RDMS will guarantee can be changing around the physical order of rows.
This is what I meant in the top comment. There is no known reason for pulling one stop_id over the other one, but somehow you need this stop_id (or whatever else) desperately.
精彩评论