How does SELECT DISTINCT work in MySQL?
I have a table开发者_JAVA技巧 with multiple rows which have a same data. I used SELECT DISTINCT
to get a unique row and it works fine. But when i use ORDER BY
with SELECT DISTINCT
it gives me unsorted data.
Can anyone tell me how distinct works?
Based on what criteria it selects the row?
From your comment earlier, the query you are trying to run is
Select distinct id from table where id2 =12312 order by time desc.
As I expected, here is your problem. Your select column and order by column are different. Your output rows are ordered by time, but that order doesn't necessarily need to preserved in the id column. Here is an example.
id | id2 | time
-------------------
1 | 12312 | 34
2 | 12312 | 12
3 | 12312 | 48
If you run
SELECT * FROM table WHERE id2=12312 ORDER BY time DESC
you will get the following result
id | id2 | time
-------------------
2 | 12312 | 12
1 | 12312 | 34
3 | 12312 | 48
Now if you select only the id column from this, you will get
id
--
2
1
3
This is why your results are not sorted.
When you specify SELECT DISTINCT
it will give you all the rows, eliminating duplicates from the result set. By "duplicates" I mean rows where all fields have the same values. For example, say you have a table that looks like:
id | num
--------------
1 | 1
2 | 3
3 | 3
SELECT DISTINCT *
would return all rows above, whereas SELECT DISTINCT num
would return two rows:
num
-----
1
3
Note that which row actual row (eg: whether it's row 2 or row 3) it selects is irrelevant, as the result would be indistinguishable.
Finally, DISTINCT
should not affect how ORDER BY
works.
Reference: MySQL SELECT
statement
The behaviour you describe happens when you ORDER BY
an expression that is not present in the SELECT
clause. The SQL standard does not allow such a query but MySQL is less strict and allows it.
Let's try an example:
SELECT DISTINCT colum1, column2
FROM table1
WHERE ...
ORDER BY column3
Let's say the content of the table table1
is:
id | column1 | column2 | column3
----+---------+---------+---------
1 | A | B | 1
2 | A | B | 5
3 | X | Y | 3
Without the ORDER BY
clause, the above query returns following two records (without ORDER BY
the order is not guaranteed):
column1 | column2
---------+---------
A | B
X | Y
But with ORDER BY column3
the order is also not guaranteed.
The DISTINCT
clause operates on the values of the expressions present in the SELECT
clause. If row #1 is processed first then (A
, B
) is placed in the result set and it is associated with row #1. Then, when row #2 is processed, the values of the SELECT
expressions produce the record (A
, B
) that is already in the result set. Because of DISTINCT
it is dropped. Row #3 produces (X
, Y
) that is also put in the result set. Then, the ORDER BY column3
clause makes the records be sorted in the result set as (A
, B
), (X
, Y
).
But if row #2 is processed before row #1 then, following the same logic exposed in the previous paragraph, the records in the result set are sorted as (X
, Y
), (A
, B
).
There is no rule imposed on the database engine about the order it processes the rows when it runs a query. The database is free to process the rows in any order it consider it's better for performance.
Your query is invalid SQL and the fact that it can return different results using the same input data proves it.
精彩评论