How to limit the return of a join to just one row per table?
Today at work we got into a discussion about which is the best way to do a query like this :
For instance lets assume a users table :
tblUsers
ID = Autoint
Name = String
and a login table :
tblLogin
ID = AUtoint
UserID = Int
IP = String
Browser = String
OS = String
timestamp = DateTime
What would be the most efficient way to list all the users and the last time they logged in (if ever), and provide an output like :
user | ip | timestamp | browser | os |
-------------------------------------------------
Some User |1.1.1.1 | 12/12/12 | userBA | win |
Other User |1.1.1.1 | 12/12/12 | userBA | win |
And Other |null | null | null | null |
Other Yet |1.1.1.1 | 12/12/12 | userBA | win |
Keep in mind th开发者_Go百科at what we want here is to show all users exactly once even if he has never logged in, and only the most recent login (i.e max(timestamp)).
Is there a way to do it in one SQL statement?
We are using MSSQL 2005.
Thanks in advance guys, Jim
SELECT tblUsers.Name, MAX(tblLogin.timestamp)
FROM
tblUsers LEFT JOIN tblLogin ON tblUsers.ID = tblLogin.UserID
GROUP BY tblUsers.ID
From experience the following query is usually several times faster
select
u.name,
l1.ip,
l1.timestamp,
l1.browser,
l1.os
from
tblUsers u
inner join
tblLogin l1
on
u.id = l1.userid
and l1.Id = ISNULL(
(select
top 1 l2.id
from
tblLogin l2
where
u.id = l2.userid
order by
timestamp desc), 0)
than this query:
select *
from (
select u.name, l.ip, l.timestamp, l.browser, l.os,
row_number() over (partition by u.id order by timestamp desc) rn
from tblUsers u
inner join tblLogin l on u.id = l.userid
) sub
where rn = 1
At one time I was particularly interested in this topic as I have a huge ( several million rows ) tables that I needed to process similar way. So I set up a test doing this both ways and the faster query ran about 20 seconds, while the slower one ran about 3 minutes 15 seconds. (This was on SQL 2005). Your set up of course could be different and this also depends on indices, but if performance is critical for you I would test it both ways and choose one that is performs better.
Usual disclaimer: I didn't actually run the query above, it is there to illustrate the idea, a few syntax errors are possible.
Im my opinion, the most readable way uses row_number()
. You can use it to number rows, starting with 1 for each user, like:
select *
from (
select u.name, l.ip, l.timestamp, l.browser, l.os,
row_number() over (partition by u.id order by timestamp desc) rn
from tblUsers u
inner join tblLogin l on u.id = l.userid
) sub
where rn = 1
A filter on rn = 1
gives the latest row per user. A subquery is required because SQL Server 2005 does not allow you to reference a row_number()
in a where
clause.
The most efficient way to do this depends on the amount of logins per user. You can find a good explanation of some of the more advanced methods in this blog post.
;WITH cLogins AS
(
SELECT
L.ip, M.LastSeen, L.browser, L.os
FROM
(SELECT UserID, MAX(timestamp) AS LastSeen FROM tblLogin GROUP BY UserID) M
LEFT JOIN
tblLogin L ON M.UserID = L.UserID AND M.LastSeen = L.JOIN
)
SELECT
I.Name, L.ip, L.LastSeen, L.browser, L.os
FROM
tblUsers U
LEFT JOIN
cLogins L ON U.UserID = L.UserID
精彩评论