Aggregate SQL Function to grab only the first from each group
I have 2 tables - an Account table and a Users table. Each account can have multiple users. I have a scenario where I want to execute a single query/join against these two tables, but I want all the Account data (Account.*) and only the first set of user data (specifically their name).
Instead of doing a "min" or "max" on my aggregated group, I wanted to do a "first". But, apparently, there is no "First" aggregate function in TSQL.
Any suggestions on how to go about getting this query? Obviously, it is easy to get the cartesian product of Account x Users:
SELECT User.Name, Account.* FROM Account, User
WHERE Account.ID = User.Account_ID
But how might I got about only getting the first user from the product based on the order of the开发者_JS百科ir User.ID ?
Rather than grouping, go about it like this...
select
*
from account a
join (
select
account_id,
row_number() over (order by account_id, id) -
rank() over (order by account_id) as row_num from user
) first on first.account_id = a.id and first.row_num = 0
I know my answer is a bit late, but that might help others. There is a way to achieve a First() and Last() in SQL Server, and here it is :
Stuff(Min(Convert(Varchar, DATE_FIELD, 126) + Convert(Varchar, DESIRED_FIELD)), 1, 23, '')
Use Min() for First() and Max() for Last(). The DATE_FIELD should be the date that determines if it is the first or last record. The DESIRED_FIELD is the field you want the first or the last value. What it does is :
- Add the date in ISO format at the start of the string (23 characters long)
- Append the DESIRED_FIELD to that string
- Get the MIN/MAX value for that field (since it start with the date, you will get the first or last record)
- Stuff that concatened string to remove the first 23 characters (the date part)
Here you go!
EDIT: I got problems with the first formula : when the DATE_FIELD has .000 as milliseconds, SQL Server returns the date as string with NO milliseconds at all, thus removing the first 4 characters from the DESIRED_FIELD. I simply changed the format to "20" (without milliseconds) and it works all great. The only downside is if you have two fields that were created at the same seconds, the sort can possibly be messy... in which cas you can revert to "126" for the format.
Stuff(Max(Convert(Varchar, DATE_FIELD, 20) + Convert(Varchar, DESIRED_FIELD)), 1, 19, '')
EDIT 2 : My original intent was to return the last (or first) NON NULL row. I got asked how to return the last or first row, wether it be null or not. Simply add a ISNULL to the DESIRED_FIELD. When you concatenate two strings with a + operator, when one of them is NULL, the result is NULL. So use the following :
Stuff(Max(Convert(Varchar, DATE_FIELD, 20) + IsNull(Convert(Varchar, DESIRED_FIELD), '')), 1, 19, '')
Select *
From Accounts a
Left Join (
Select u.*,
row_number() over (Partition By u.AccountKey Order By u.UserKey) as Ranking
From Users u
) as UsersRanked
on UsersRanked.AccountKey = a.AccountKey and UsersRanked.Ranking = 1
This can be simplified by using the Partition By clause. In the above, if an account has three users, then the subquery numbers them 1,2, and 3, and for a different AccountKey, it will reset the numnbering. This means for each unique AccountKey, there will always be a 1, and potentially 2,3,4, etc.
Thus you filter on Ranking=1 to grab the first from each group.
This will give you one row per account, and if there is at least one user for that account, then it will give you the user with the lowest key(because I use a left join, you will always get an account listing even if no user exists). Replace Order By u.UserKey
with another field if you prefer that the first user be chosen alphabetically or some other criteria.
I've benchmarked all the methods, the simpelest and fastest method to achieve this is by using outer/cross apply
SELECT u.Name, Account.* FROM Account
OUTER APPLY (SELECT TOP 1 * FROM User WHERE Account.ID = Account_ID ) as u
CROSS APPLY works just like INNER JOIN and fetches the rows where both tables are related, while OUTER APPLY works like LEFT OUTER JOIN and fetches all rows from the left table (Account here)
You can use OUTER APPLY, see documentation.
SELECT User1.Name, Account.* FROM Account
OUTER APPLY
(SELECT TOP 1 Name
FROM [User]
WHERE Account.ID = [User].Account_ID
ORDER BY Name ASC) User1
SELECT (SELECT TOP 1 Name
FROM User
WHERE Account_ID = a.AccountID
ORDER BY UserID) [Name],
a.*
FROM Account a
The STUFF response from Dominic Goulet is slick. But, if your DATE_FIELD is SMALLDATETIME (instead of DATETIME), then the ISO 8601 length will be 19 instead of 23 (because SMALLDATETIME has no milliseconds) - so adjust the STUFF parameter accordingly or the return value from the STUFF function will be incorrect (missing the first four characters).
First and Last do not exist in Sql Server 2005 or 2008, but in Sql Server 2012 there is a First_Value, Last_Value function. I tried to implement the aggregate First and Last for Sql Server 2005 and came to the obstacle that sql server does guarantee the calculation of the aggregate in a defined order. (See attribute SqlUserDefinedAggregateAttribute.IsInvariantToOrder Property, which is not implemented.) This might be because the query analyser tries to execute the calculation of the aggregate on multiple threads and combine the results, which speeds up the execution, but does not guarantee an order in which elements are aggregated.
Define "First". What you think of as first is a coincidence that normally has to do with clustered index order but should not be relied on (you can contrive examples that break it).
You are right not to use MAX() or MIN(). While tempting, consider the scenario where you the first name and last name are in separate fields. You might get names from different records.
Since it sounds like all your really care is that you get exactly one arbitrary record for each group, what you can do is just MIN or MAX an ID field for that record, and then join the table into the query on that ID.
There are a number of ways of doing this, here a a quick and dirty one.
Select (SELECT TOP 1 U.Name FROM Users U WHERE U.Account_ID = A.ID) AS "Name,
A.*
FROM Account A
(Slightly Off-Topic, but) I often run aggregate queries to list exception summaries, and then I want to know WHY a customer is in the results, so use MIN and MAX to give 2 semi-random samples that I can look at in details e.g.
SELECT Customer.Id, COUNT(*) AS ProblemCount
, MIN(Invoice.Id) AS MinInv, MAX(Invoice.Id) AS MaxInv
FROM Customer
INNER JOIN Invoice on Invoice.CustomerId = Customer.Id
WHERE Invoice.SomethingHasGoneWrong=1
GROUP BY Customer.Id
Create and join with a subselect 'FirstUser' that returns the first user for each account
SELECT User.Name, Account.*
FROM Account, User,
(select min(user.id) id,account_id from User group by user.account_id) as firstUser
WHERE Account.ID = User.Account_ID
and User.id = firstUser.id and Account.ID = firstUser.account_id
精彩评论