Enrich website log data with user id in SQL
Assume that we have typical session log from the website with cookies and user id (in case he/she provided credentials):
cookie id timestamp
af312b452c . 31-03-11 1:44 PM
af312b452c . 31-03-11 1:46 PM
af312b452c . 31-03-11 1:48 PM
af312b452c . 31-03-11 1:51 PM
af312b452c . 31-03-11 1:53 PM
af312b452c . 31-03-11 1:56 PM
af312b452c 1 31-03-11 1:58 PM
af312b452c 1 31-03-11 2:01 PM
31-03-11 1:58 PM - user logged-in, so since then we are tracking his/her activity alongside with user_id.
I want to enrich such data by filling missing fields backwards to have complete information what particular user has seen on the webpage:
cookie id timestamp
af312b452c 1 31-03-11开发者_如何学编程 1:44 PM
af312b452c 1 31-03-11 1:46 PM
af312b452c 1 31-03-11 1:48 PM
af312b452c 1 31-03-11 1:51 PM
af312b452c 1 31-03-11 1:53 PM
af312b452c 1 31-03-11 1:56 PM
af312b452c 1 31-03-11 1:58 PM
af312b452c 1 31-03-11 1:58 PM
What will be the best way to do it in SQL?
Notes:
- whole table has many rows,
- users can have multiple cookies.
- a particular cookie value may have been used for a different user at some point in the past
Thank you all in advance.
I would start by creating a function that would get me the most recent logged in ID of a cookie given that cookie and the timestamp you logged that cookie (psuedocode off the top of my head, apologies if something is off):
CREATE FUNCTION getFutureLogin(in_cookie VARCHAR, in_timestamp DATETIME) RETURNS INT AS
BEGIN
DECLARE logged_in_id IN
SELECT logged_in_id = TOP 1 id
FROM T_NAME
WHERE cookie = in_cookie
AND timestamp > in_timestamp
AND id IS NOT NULL
ORDER BY timestamp ASC
RETURN logged_in_id
END
From there, it's a simple update that you'd probably schedule a job to do sometime.
UPDATE T_NAME SET id = getFutureLogin(cookie, timestamp) WHERE id IS NULL
May not be the most straightforward or fastest way, but it would work.
精彩评论