SQL Query to delete records occuring within 3000 milliseconds of each other
I have a table of data recording certain user events. The results looks something like:
ID Username EventDate
1 UserA 2010-10-21 16:59:59.367
2 UserA 2010-10-21 17:00:00.114
3 UserA 2010-10-21 17:00:00.003
4 UserA 2010-10-21 17:00:02.867
5 UserB 2010-10-21 18:43:26.538
6 UserB 2010-10-21 18:47:33.373
I want to run a query that removes all events that occur within 3000 milliseconds of a previous event. Note that milliseconds 开发者_如何学编程are relevant.
The resulting table would look like:
ID Username EventDate
1 UserA 2010-10-21 16:59:59.367
4 UserA 2010-10-21 17:00:02.867
5 UserB 2010-10-21 18:43:26.538
6 UserB 2010-10-21 18:47:33.373
How can I do this?
You can use a while
loop to remove one row at a time. This avoids the problem where multiple rows are all within 3 seconds of eachother, but not within 3 seconds of the first row.
For example:
declare @t table (ID int, Username varchar(50), EventDate datetime)
insert @t
select 1, 'UserA', '2010-10-21 16:59:59.367'
union all select 2, 'UserA', '2010-10-21 17:00:00.114'
union all select 3, 'UserA', '2010-10-21 17:00:00.003'
union all select 4, 'UserA', '2010-10-21 17:00:02.867'
union all select 5, 'UserB', '2010-10-21 18:43:26.538'
union all select 6, 'UserB', '2010-10-21 18:47:33.373'
while 1=1
begin
delete @t
where ID =
(
select top 1 t2.ID
from @t t2
where exists
(
select *
from @t t1
where t1.Username = t2.Username
and t1.EventDate < t2.EventDate
and datediff(millisecond, t1.EventDate,
t2.EventDate) <= 3000
)
)
if @@ROWCOUNT = 0
break
end
select * from @t
This prints:
ID Username EventDate
1 UserA 2010-10-21 16:59:59.367
4 UserA 2010-10-21 17:00:02.867
5 UserB 2010-10-21 18:43:26.537
6 UserB 2010-10-21 18:47:33.373
If we're removing these results from a result set, and each users events are treated separately, then the following works (stealing table defn from Andomar's answer):
declare @t table (ID int, Username varchar(50), EventDate datetime)
insert @t
select 1, 'UserA', '2010-10-21 16:59:59.367'
union all select 2, 'UserA', '2010-10-21 17:00:00.114'
union all select 3, 'UserA', '2010-10-21 17:00:00.003'
union all select 4, 'UserA', '2010-10-21 17:00:02.867'
union all select 5, 'UserB', '2010-10-21 18:43:26.538'
union all select 6, 'UserB', '2010-10-21 18:47:33.373'
;WITH PerUserIDs AS (
SELECT ID,Username,EventDate,ROW_NUMBER() OVER (PARTITION BY Username ORDER BY EventDate) as R from @t
), Sequenced AS (
SELECT ID,Username,EventDate,R from PerUserIDs where R = 1
union all
select pui.ID,pui.UserName,pui.EventDate,pui.R
from
Sequenced s
inner join
PerUserIDs pui
on
s.R < pui.R and
s.Username = pui.Username and
DATEDIFF(millisecond,s.EventDate,pui.EventDate) >= 3000
where
not exists(select * from PerUserIDs anti where anti.R < pui.R and s.R < anti.R and s.Username = anti.username and DATEDIFF(millisecond,s.EventDate,anti.EventDate)>= 3000)
)
select * from Sequenced order by Username,EventDate
If you do need to actually delete, then you can delete from your table where ID not in (select ID from Sequenced)
By previous event, did you mean insertion? Why dont take a time line as base?
You could write a procedure to do this by iterating through the events in order of date, and comparing the value of the previous event to the current one using DATEDIFF. If this is an active system and your goal is to prevent duplicate event logs, it would make more sense to use a trigger to prevent the insertion of any new event in a similar fashion.
精彩评论