Are these two queries the same - GROUP BY vs. DISTINCT?

2023-01-09 00:54 问答作者：

These two queries seem to return the same results. Is that coincidental or are they really the same?

SELECT t.ItemNumber,
  (SELECT TOP 1 ItemDescription
   FROM Transactions
   WHERE ItemNumber = t.ItemNumber
   ORDER BY DateCreated DESC) AS ItemDescription
FROM Transactions t
GROUP BY t.ItemNumber

SELECT DISTINCT(t.ItemNumber),
  (SELECT TOP 1 ItemDescription
   FROM Transactions
   WHERE ItemNumber = t.ItemNumber
   ORDER BY DateCreated DESC) AS ItemDescription
FROM Transactions t

A bit of explanation: I'm trying to g开发者_JAVA百科et a distinct list of items from a table full of transactions. For each item, I'm looking for the ItemNumber (the identifying field) and the most recent ItemDescription.

Your example #2 had me scratching me head for a while - I thought to myself: "You can't DISTINCT a single column, what would that mean?" - until I realised what is going on.

When you have

SELECT DISTINCT(t.ItemNumber)

you are not, despite appearances, actually asking for distinct values of t.ItemNumber! Your example #2 actually gets parsed the same as

SELECT DISTINCT
  (t.ItemNumber)
  ,
  (SELECT TOP 1 ItemDescription
   FROM Transactions
   WHERE ItemNumber = t.ItemNumber
   ORDER BY DateCreated DESC) AS ItemDescription
FROM Transactions t

with syntactically-correct but superfluous parentheses around t.ItemNumber. It is to the result-set as a whole that DISTINCT applies.

In this case, since your GROUP BY groups by the column that actually varies, you get the same results. I'm actually slightly surprised that SQL Server doesn't (in the GROUP BY example) insist that the subqueried column is mentioned in the GROUP BY list.

Same results but the second one seems to have a more expensive sort step to apply the DISTINCT on my quick test.

Both were beaten out of sight by ROW_NUMBER though...

with T as
(
SELECT ItemNumber, 
       ItemDescription,
       ROW_NUMBER() OVER ( PARTITION BY ItemNumber ORDER BY DateCreated DESC) AS RN
FROM Transactions
)
SELECT * FROM T
WHERE RN=1

edit ...which in turn was thumped by Joe's solution on my test setup.

Are these two queries the same - GROUP BY vs. DISTINCT?

Test Setup

CREATE TABLE Transactions
(
ItemNumber INT not null,
ItemDescription VARCHAR(50) not null,
DateCreated DATETIME not null
)

INSERT INTO Transactions
SELECT 
number, NEWID(),DATEADD(day, cast(rand(CAST(newid() as varbinary))*10000 
  as int),getdate()) 
FROM master.dbo.spt_values

ALTER TABLE dbo.Transactions ADD CONSTRAINT
    PK_Transactions PRIMARY KEY CLUSTERED 
    (ItemNumber,DateCreated)

Based on the data & simple queries, both will return the same results. However, the fundamental operations are very different.

DISTINCT, as AakashM beat me to pointing out, is applied to all column values, including those from subselects and computed columns. All DISTINCT does is remove duplicates, based on all columns involved, from visibility. This is why it's generally considered a hack, because people will use it to get rid of duplicates without understanding why the query is returning them in the first place (because they should be using IN or EXISTS rather than a join, typically). PostgreSQL is the only database I know of with a DISTINCT ON clause, which does work as the OP probably intended.

A GROUP BY clause is different - it's primary use is for grouping for accurate aggregate function use. To server that function, column values will be unique values based on what's defined in the GROUP BY clause. This query would never need DISTINCT, because the values of interest are already unique.

Conclusion

This is a poor example, because it portrays DISTINCT and GROUP BY as equals when they are not.

If you're running at least 2005 and can use a CTE, this is a little cleaner IMHO.

EDIT: As pointed out in Martin's answer, this also performs much better.

;with cteMaxDate as (
    select t.ItemNumber, max(DateCreated) as MaxDate
        from Transactions t
        group by t.ItemNumber
)
SELECT t.ItemNumber, t.ItemDescription
    FROM cteMaxDate md
        inner join Transactions t
            on md.ItemNumber = t.ItemNumber
                and md.MaxDate = t.DateCreated

Yes, they will return the same results.

Since you're not using any aggregate functions, SQL Server should be smart enough to treat the GROUP BY as a DISTINCT.

You may also be interested in checking out the following Stack Overflow post for further reading on this topic:

Is there any difference between Group By and Distinct?

GROUP BY is needed to properly return results when using aggregate functions in a sql query. As you are not using an aggregate function, there is no need for the GROUP BY, and thus the queries are the same.

Yes they return the same results.

Normally the group by clause (found here) groups the rows by the specific column mentioned so if you have a sum in your select statement. Thus if you have a table like :

O_Id        OrderDate   OrderPrice      Customer
1           2008/11/12  1000            Hansen
2           2008/10/23  1600            Nilsen
3           2008/09/02  700             Hansen
4           2008/09/03  300             Hansen
5           2008/08/30  2000            Jensen
6           2008/10/04  100             Nilsen

If you group by customer and ask for the sum or the order price you will get

Customer    SUM(OrderPrice)
Hansen          2000
Nilsen             1700
Jensen          2000

Contrary to this the distinct (found here) just makes it so you don't have duplicate rows. In this case the original table would stay the same since each row is different from the others.

继续阅读：group-by sql sql-server sql-server-2008

Are these two queries the same - GROUP BY vs. DISTINCT?

Conclusion

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Conclusion

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？