开发者

How to limit the return of a join to just one row per table?

Today at work we got into a discussion about which is the best way to do a query like this :

For instance lets assume a users table :

tblUsers
ID      = Autoint
Name    = String

and a login table :

tblLogin
ID         = AUtoint
UserID    = Int
IP        = String
Browser   = String
OS        = String
timestamp = DateTime

What would be the most efficient way to list all the users and the last time they logged in (if ever), and provide an output like :

user       | ip     | timestamp | browser | os   |
-------------------------------------------------
Some User  |1.1.1.1 | 12/12/12  | userBA  | win  |
Other User |1.1.1.1 | 12/12/12  | userBA  | win  |
And Other  |null    | null      | null    | null |
Other Yet  |1.1.1.1 | 12/12/12  | userBA  | win  |

Keep in mind th开发者_Go百科at what we want here is to show all users exactly once even if he has never logged in, and only the most recent login (i.e max(timestamp)).

Is there a way to do it in one SQL statement?

We are using MSSQL 2005.

Thanks in advance guys, Jim


SELECT tblUsers.Name, MAX(tblLogin.timestamp)
FROM 
tblUsers LEFT JOIN tblLogin ON tblUsers.ID = tblLogin.UserID
GROUP BY tblUsers.ID


From experience the following query is usually several times faster

select 
    u.name, 
    l1.ip, 
    l1.timestamp, 
    l1.browser, 
    l1.os
from 
    tblUsers u
inner join 
    tblLogin l1 
on 
    u.id = l1.userid
    and l1.Id = ISNULL(
        (select 
            top 1 l2.id 
        from 
            tblLogin l2 
        where 
            u.id = l2.userid 
        order by 
            timestamp desc), 0)

than this query:

select *
from (
    select u.name, l.ip, l.timestamp, l.browser, l.os,
      row_number() over (partition by u.id order by timestamp desc) rn 
    from tblUsers u
    inner join tblLogin l on u.id = l.userid
) sub
where rn = 1

At one time I was particularly interested in this topic as I have a huge ( several million rows ) tables that I needed to process similar way. So I set up a test doing this both ways and the faster query ran about 20 seconds, while the slower one ran about 3 minutes 15 seconds. (This was on SQL 2005). Your set up of course could be different and this also depends on indices, but if performance is critical for you I would test it both ways and choose one that is performs better.

Usual disclaimer: I didn't actually run the query above, it is there to illustrate the idea, a few syntax errors are possible.


Im my opinion, the most readable way uses row_number(). You can use it to number rows, starting with 1 for each user, like:

select *
from (
    select u.name, l.ip, l.timestamp, l.browser, l.os,
      row_number() over (partition by u.id order by timestamp desc) rn 
    from tblUsers u
    inner join tblLogin l on u.id = l.userid
) sub
where rn = 1

A filter on rn = 1 gives the latest row per user. A subquery is required because SQL Server 2005 does not allow you to reference a row_number() in a where clause.

The most efficient way to do this depends on the amount of logins per user. You can find a good explanation of some of the more advanced methods in this blog post.


;WITH cLogins AS
(
  SELECT
     L.ip, M.LastSeen, L.browser, L.os
  FROM
      (SELECT UserID, MAX(timestamp) AS LastSeen FROM tblLogin GROUP BY UserID) M
      LEFT JOIN
      tblLogin L ON M.UserID = L.UserID AND M.LastSeen = L.JOIN 
)
SELECT
  I.Name, L.ip, L.LastSeen, L.browser, L.os
FROM 
  tblUsers U
  LEFT JOIN
  cLogins L ON U.UserID = L.UserID
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜