开发者

latest visitors

Say you want to display the latest visitors on a users 开发者_如何学Pythonprofile page. How would you structure this? Perhaps a table called uservisitors:

  • userid (the user that gets a visit)
  • visitorid (the visitor)
  • time

And how would you select this with mysql? Without any duplicates, What I mean is if User 1 visits user 2's profile, then 5min later visits again, I dont want it to show both entries only the latest

cheers!


SELECT  visitorid, MAX(time) AS lastvisit
FROM    uservisitors
WHERE   userid = ?
GROUP BY
        userid, visitorid
ORDER BY
        lastvisit DESC
LIMIT 5

Create a composite index on (userid, visitorid, time) for this to work faster.

This query:

SELECT  visitorid
FROM    uservisitors ui
WHERE   userid = ?
        NOT EXISTS
        (
        SELECT  NULL
        FROM    uservisitors uo
        WHERE   uo.userid = ui.userid
                AND uo.visitorid = ui.visitorid
                AND uo.time > ui.time
        )
ORDER BY
        time DESC
LIMIT 5

may be more efficient if you have lots of distinct visitors.

In this case, you'll need an index on (userid, time, visitorid).

Update:

This article in my blog compares both approaches:

  • Latest visitors


Something like this should work:

SELECT visitorid, MAX(time)
FROM uservisitors
WHERE userid = 1
GROUP BY visitorid


Find distinct visitors in the last 5 minutes. You can change the INTERVAL to whatever time period such as INTERVAL 1 DAY for the last 24 hours.

SELECT distinct visitorid 
FROM uservisitors
WHERE userid = 1 AND time > DATE_SUB(NOW(), INTERVAL 5 MIN);


Assuming you only care about the last visit by a visitor and not ALL the visits, how about using a unique key pair for userid/visitorid and then using INSERT INTO.. ON DUPLICATE KEY:

CREATE TABLE uservisitors (... UNIQUE (userid, visitorid) );
INSERT INTO uservisitors (userid, visitorid, time) VALUES (....) ON DUPLICATE KEY UPDATE time=NOW();

Then its a simple select to get the top 5 visitors:

SELECT visitorid FROM uservisitors WHERE user_id=1 ORDER BY time DESC LIMIT 5;

No duplicate records or having to use group by.


How do you plan on deleting stuff from the table later on? You might need to add another index for that to work efficiently.

If you only want the last 5 visits, I'd denormalize the table (saves space, index and only uses one single primary key lookup) and in the process, I would no longer have to worry about deleting old data. Each user will only occupy one row in the table so it will not grow much over time. So:

CREATE TABLE user_visitors (user_id int primary key,
visitor_1_id int,
visitor_1_time timestamp,
..
visitor_5_id int,
visitor_5_time timestamp);

To store a visit you would either insert a new row with visitor_1 as the visitor. On duplicates, you would shift down previously stored values:

INSERT INTO user_visitors SET ... 
... ON DUPLICATE KEY UPDATE visitor_5_id = visitor_4_id, visitor_5_time = visitor_4_time ...
... visitor_1_id = ?, visitor_1_time = ?

If you worry about duplicates you might be able to work it out by adding IFs in the UPDATE section, such that the row won't get updated if the visitor_id is already present in the row. In code you can check for updated row count. If zero, handle the duplicate visitor time update from there. It involves some work but it will be fast and easy to understand.


I've dealt with this a few times, here's my take on it.

My table looks like this:

CREATE TABLE visitors (
userid int,
visitorid int,
last_visit datetime,
primary key(userid, visitorid),
index(visitorid)
index(userid, last_visit)
) engine = memory;

Inserting data:

INSERT INTO visitors (userid, last_visit) VALUES ( 50, now() ) 
ON DUPLICATE KEY UPDATE last_visit = now();

Selecting:

select * from visitors WHERE userid=10 order by last_visit limit 10; # or whatever you need

Occasionally purge the table

DELETE from visitors WHERE last_visit < date_add(now(), INTERVAL -1 WEEK);

This is the way to go for a few reasons.

  1. You're using an in memory table, so you never touch disk.
  2. Index is on (visitorid, last_update), so it's pure index lookup. Very fast. Even without it, it should be fast.
  3. Pulling users via a separate query will let you cache them, but in theory they should be cached already if they've visited the site recently. Even if you aren't using cache, an in() query on user (primary key) should be very fast.

You can run a cron to backup this table once a minute

SELECT * from visitors INTO OUTFILE "/tmp/visitors.txt"
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜