开发者

SQL Query to delete records occuring within 3000 milliseconds of each other

I have a table of data recording certain user events. The results looks something like:

ID    Username EventDate
1     UserA    2010-10-21 16:59:59.367
2     UserA    2010-10-21 17:00:00.114    
3     UserA    2010-10-21 17:00:00.003
4     UserA    2010-10-21 17:00:02.867
5     UserB    2010-10-21 18:43:26.538
6     UserB    2010-10-21 18:47:33.373

I want to run a query that removes all events that occur within 3000 milliseconds of a previous event. Note that milliseconds 开发者_如何学编程are relevant.

The resulting table would look like:

ID    Username EventDate
1     UserA    2010-10-21 16:59:59.367
4     UserA    2010-10-21 17:00:02.867
5     UserB    2010-10-21 18:43:26.538
6     UserB    2010-10-21 18:47:33.373

How can I do this?


You can use a while loop to remove one row at a time. This avoids the problem where multiple rows are all within 3 seconds of eachother, but not within 3 seconds of the first row.

For example:

declare @t table (ID int, Username varchar(50), EventDate datetime)
insert @t
          select 1,     'UserA',    '2010-10-21 16:59:59.367'
union all select 2,     'UserA',    '2010-10-21 17:00:00.114'    
union all select 3,     'UserA',    '2010-10-21 17:00:00.003'
union all select 4,     'UserA',    '2010-10-21 17:00:02.867'
union all select 5,     'UserB',    '2010-10-21 18:43:26.538'
union all select 6,     'UserB',    '2010-10-21 18:47:33.373'


while 1=1
    begin
    delete  @t
    where   ID =
            (
            select  top 1 t2.ID
            from    @t t2
            where   exists
                    (
                    select  *
                    from    @t t1
                    where   t1.Username = t2.Username
                            and t1.EventDate < t2.EventDate
                            and datediff(millisecond, t1.EventDate, 
                                         t2.EventDate) <= 3000
                    )
            )

    if @@ROWCOUNT = 0 
        break
    end

select * from @t

This prints:

ID  Username    EventDate
1   UserA       2010-10-21 16:59:59.367
4   UserA       2010-10-21 17:00:02.867
5   UserB       2010-10-21 18:43:26.537
6   UserB       2010-10-21 18:47:33.373


If we're removing these results from a result set, and each users events are treated separately, then the following works (stealing table defn from Andomar's answer):

declare @t table (ID int, Username varchar(50), EventDate datetime)
insert @t
          select 1,     'UserA',    '2010-10-21 16:59:59.367'
union all select 2,     'UserA',    '2010-10-21 17:00:00.114'    
union all select 3,     'UserA',    '2010-10-21 17:00:00.003'
union all select 4,     'UserA',    '2010-10-21 17:00:02.867'
union all select 5,     'UserB',    '2010-10-21 18:43:26.538'
union all select 6,     'UserB',    '2010-10-21 18:47:33.373'

;WITH PerUserIDs AS (
    SELECT ID,Username,EventDate,ROW_NUMBER() OVER (PARTITION BY Username ORDER BY EventDate) as R from @t
), Sequenced AS (
    SELECT ID,Username,EventDate,R from PerUserIDs where R = 1
    union all
    select pui.ID,pui.UserName,pui.EventDate,pui.R
    from
        Sequenced s
            inner join
        PerUserIDs pui
            on
                s.R < pui.R and
                s.Username = pui.Username and
                DATEDIFF(millisecond,s.EventDate,pui.EventDate) >= 3000
    where
        not exists(select * from PerUserIDs anti where anti.R < pui.R and s.R < anti.R and s.Username = anti.username and DATEDIFF(millisecond,s.EventDate,anti.EventDate)>= 3000)
)
select * from Sequenced order by Username,EventDate

If you do need to actually delete, then you can delete from your table where ID not in (select ID from Sequenced)


By previous event, did you mean insertion? Why dont take a time line as base?


You could write a procedure to do this by iterating through the events in order of date, and comparing the value of the previous event to the current one using DATEDIFF. If this is an active system and your goal is to prevent duplicate event logs, it would make more sense to use a trigger to prevent the insertion of any new event in a similar fashion.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜