how to develop t-sql subquery to select only one record each?
I am using SSMS 2008, trying to select just one row/client. I need to select the following columns: client_name
, end_date
, and program
. Some clients have just one client row. But others have开发者_高级运维 multiple.
For those clients with multiple rows, they normally have different end_date
and program
. For instance:
CLIENT PROGRAM END_DATE
a b c
a d e
a f g
h d e
h f NULL
This is a real simplified version of the actual data. As you will see, different clients can be in the same program ("d"). But the same client cannot be in the same program more than one time.
Also the tricky thing is that the end_date
can be NULL, so when I tried selecting those clients with > 1 row, I added a HAVING statement > 1. But this eliminated all of my NULL End_date
rows.
To sum up, I want one row per client. So those clients with only one row total + those clients listed above with the following criteria:
- Select only the row where either the
End_date
is greatest or NULL. (In most cases theend_date
is null for these clients).
How can I achieve this with as little logic as possible?
On SQL Server 2005 and up, you can use a Common Table Expression (CTE) combined with the ROW_NUMBER()
and PARTITION BY
function. This CTE will "partition" your data by one criteria - in your case by Client
, creating a "partition" for each separate client. The ROW_NUMBER()
will then number each partition ordered by another criteria - here I created a DATETIME
- and assigns numbers from 1 on up, separately for each partition.
So in this case, ordering by DATETIME DESC
, the newest row gets numbered as 1 - and that's the fact I use when selecting from the CTE. I used the ISNULL()
function here to assign those rows that have a NULL end_date
some arbitrary value to "get them in order". I wasn't quite sure if I understood your question properly: did you want to select the NULL rows over those with a given end_Date
, or did you want to give precedence to an existing end_Date
value over NULL?
This will select the most recent row for each client (for each "partition" of your data):
DECLARE @clients TABLE (Client CHAR(1), Program CHAR(1), END_DATE DATETIME)
INSERT INTO @clients
VALUES('a', 'b', '20090505'),
('a', 'd', '20100808'),
('a', 'f', '20110303'),
('h', 'd', '20090909'),
('h', 'f', NULL)
;WITH LatestData AS
(
SELECT Client, Program, End_Date,
ROW_NUMBER() OVER(PARTITION BY CLient ORDER BY ISNULL(End_Date, '99991231') DESC) AS 'RowNum'
FROM @clients
)
SELECT Client, Program, End_Date
FROM LatestData
WHERE RowNum = 1
Results in an output of:
Client Program End_Date
a f 2011-03-03
h f (NULL)
精彩评论