How to select only one full row per group in a "group by" query?

2023-01-04 10:36 问答作者：

In SQL Server, I have a table where a column A stores some data. This data can contain duplicates (ie. two or more rows will have the same value for the column A).

I can easily find the duplicates by doing:

select A, count(A) as CountDuplicates
from TableName
group by A having (count(A) > 1)

Now, I want to retrieve the values of other columns, let's say B and C. Of course, those B and C values can be different even for the rows sharing the same A value, but it doesn't matter for me. I just want any B value and any C one, the first, the last or the random one.

If I had a small table and one or two columns to retrieve, I would do something like:

select A, count(A) as CountDuplicates, (
    select top 1 child.B from TableName a开发者_如何学编程s child where child.A = base.A) as B
)
from TableName as base group by A having (count(A) > 1)

The problem is that I have much more rows to get, and the table is quite big, so having several children selects will have a high performance cost.

So, is there a less ugly pure SQL solution to do this?

Not sure if my question is clear enough, so I give an example based on AdventureWorks database. Let's say I want to list available States, and for each State, get its code, a city (any city) and an address (any address). The easiest, and the most inefficient way to do it would be:

var q = from c in data.StateProvinces select new { c.StateProvinceCode, c.Addresses.First().City, c.Addresses.First().AddressLine1 };

in LINQ-to-SQL and will do two selects for each of 181 States, so 363 selects. I my case, I am searching for a way to have a maximum of 182 selects.

The ROW_NUMBER function in a CTE is the way to do this. For example:

DECLARE @mytab TABLE (A INT, B INT, C INT)
INSERT INTO @mytab ( A, B, C ) VALUES (1, 1, 1)
INSERT INTO @mytab ( A, B, C ) VALUES (1, 1, 2)
INSERT INTO @mytab ( A, B, C ) VALUES (1, 2, 1)
INSERT INTO @mytab ( A, B, C ) VALUES (1, 3, 1)
INSERT INTO @mytab ( A, B, C ) VALUES (2, 2, 2)
INSERT INTO @mytab ( A, B, C ) VALUES (3, 3, 1)
INSERT INTO @mytab ( A, B, C ) VALUES (3, 3, 2)
INSERT INTO @mytab ( A, B, C ) VALUES (3, 3, 3)
;WITH numbered AS 
(
    SELECT *, rn=ROW_NUMBER() OVER (PARTITION BY A ORDER BY B, C)
        FROM @mytab AS m
)
SELECT *
    FROM numbered
    WHERE rn=1

As I mentioned in my comment to HLGEM and Philip Kelley, their simple use of an aggregate function does not necessarily return one "solid" record for each A group; instead, it may return column values from many separate rows, all stitched together as if they were a single record. For example, if this were a PERSON table, with the PersonID being the "A" column, and distinct contact records (say, Home and Word), you might wind up returning the person's home city, but their office ZIP code -- and that's clearly asking for trouble.

The use of the ROW_NUMBER, in conjunction with a CTE here, is a little difficult to get used to at first because the syntax is awkward. But it's becoming a pretty common pattern, so it's good to get to know it.

In my sample I've define a CTE that tacks on an extra column rn (standing for "row number") to the table, that itself groups by the A column. A SELECT on that result, filtering to only those having a row number of 1 (i.e., the first record found for that value of A), returns a "solid" record for each A group -- in my example above, you'd be certain to get either the Work or Home address, but not elements of both mixed together.

It concerns me that you want any old value for fields b and c. If they are to be meaningless why are you returning them?

If it truly doesn't matter (and I honestly can't imagine a case where I would ever want this, but it's what you said) and the values for b and c don't even have to be from the same record, group by with the use of mon or max is the way to go. It's more complicated if you want the values for a particular record for all fields.

select A, count(A) as CountDuplicates, min(B) as B , min(C) as C
from TableName as base 
group by A 
having (count(A) > 1)

you can do some thing like this if you have id as primary key in your table

select id,b,c from tablename 
inner join
(
select id, count(A) as CountDuplicates
from TableName as base group by A,id having (count(A) > 1) 
)d on tablename.id= d.id

继续阅读：group-by sql tsql

How to select only one full row per group in a "group by" query?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？