开发者

How do I average the difference between specific values in TSQL?

Hey folks, sorry this is a bit of a longer question...

I have a table with t开发者_JS百科he following columns:

[ChatID] [User] [LogID] [CreatedOn] [Text]

What I need to find is the average response time for a given user id, to another specific user id. So, if my data looks like:

[1] [john] [20] [1/1/11 3:00:00] [Hello]
[1] [john] [21] [1/1/11 3:00:23] [Anyone there?]
[1] [susan] [22] [1/1/11 3:00:43] [Hello!]
[1] [susan] [23] [1/1/11 3:00:53] [What's up?]
[1] [john] [24] [1/1/11 3:01:02] [Not much]
[1] [susan] [25] [1/1/11 3:01:08] [Cool]

...then I need to see that Susan has an average response time of (20 + 6) / 2 => 13 seconds to John, and John has an average of (9 / 1) => 9 seconds to Susan.

I'm not even sure this can be done in set-based logic, but if anyone has any ideas, they'd be much appreciated!


I don't have a PC to verify syntax or anything, but I think this should give you a starting place:

WITH ChatWithRownum AS (
    SELECT ChatID, User, LogID, CreatedOn, ROW_NUMBER() OVER(ORDER BY ChatID, CreatedOn) AS rownum
    FROM ChatLog
)
SELECT First.ChatID, Second.User,
    AVG(DATEDIFF(seconds, First.CreatedOn, Second.CreatedOn)) AS AvgElapsedTime
FROM ChatWithRownum First
    JOIN ChatWithRownum Second ON First.ChatID = Second.ChatID
        AND First.rownum = Second.rownum - 1
WHERE First.User != Second.User
GROUP BY First.ChatID, Second.User

Essentially, the idea is to add row numbers to the data so you can join one row to the next row (so you have a statement followed by its immediate response). Once you have the rows joined, you can get the time elapsed between the two entries, and then group the data by the ChatID (I'm assuming that times between separate chats aren't relevant) and the two users. Like I said though, this is just a starting place as I'm sure there may be some additional criteria and/or bugs in my query :)


Try something simple like the following before moving into cursors.

select ChatId, User, datediff('second', min(CreatedOn, max(CreatedOn))/count(*)
from ChatLog
group by ChatId, User

This one works and doesn't involve using cursors. If I had more time, I could probably even eliminate the temp table, but hey... it works.

declare @operator varchar(50)
set @operator = 'john'
declare @customer varchar(50)
set @customer = 'susan'
declare @chatid int
set @chatid = 1

declare @t table (chatid int, username varchar(50), responsetime int)

insert @t (chatid, username, responsetime)
select ChatId, 
    Username,
    datediff(second, 
    CreatedOn,
    (
        select min(createdon)
        from chatlog
        where createdon > cl.createdon
        and username = @customer
              and chatid = @chatid
    ))
from ChatLog cl
where chatid = @chatid and username = @operator

insert @t (chatid, username, responsetime)
select ChatId, 
    Username, 
    datediff(second, 
    CreatedOn,
    (
        select min(createdon)
        from chatlog
        where createdon > cl.createdon
        and username = @operator
              and chatid = @chatid
    ))
from ChatLog cl
where chatid = @chatid and username = @customer

select chatid, username, avg(responsetime) as avgresponsetime 
from @t
group by chatid, username
order by username


Seems like you need a cursor to step through each line and check for the change of user in the record, and get the difference of that time, and store it somewhere(temp table maybe), and aggregate it later.

I believe it can be done in TSQL, logic would be something like:


DECLARE delta CURSOR FOR
SELECT user, createdon from table
order by createdon --or logid

OPEN delta
fetch next from delta into @username, @time
while @@fetch_status = 0

begin

FETCH PRIOR FROM delta into @username_prior, @time_prior
IF @username_prior  @username
BEGIN
  @timedelta = @time - @time_prior
  @total = @total + @timedelta
  insert into #average (@username, @total)
END 

fetch next from delta into @username, @time
END

CLOSE delta
DEALLOCATE delta

SELECT user, AVG(time) from #average
group by user

I'm sure you can figure out how to declare all the parameters.


This can be done with RowNumber() and DateDiff()

WITH TableWithOrderings AS (
    SELECT DateTime, ROW_NUMBER() OVER (ORDER BY DateTime) AS Ordering
    FROM myTable
)

WITH Intervals As (
    SELECT DATEDIFF(second, A.DateTime, B.DateTime) AS IntervalSeconds
    FROM TableWithOrderings A
        INNER JOIN TableWithOrderings B ON B.Ordering = A.Ordering + 1
)

SELECT AVG(IntervalSeconds) FROM Intervals


Try this:

create table chats
(
chat_id int not null,
user_name text not null,
log_id int not null primary key,
created_on timestamp not null,
message text not null
);


insert into chats(chat_id, user_name, log_id, created_on, message)
values(1, 'john', 20, '1/1/11 3:00:00', 'Hello'),
(1, 'john',21, '1/1/11 3:00:23', 'Anyone there?'),
(1, 'susan',22, '1/1/11 3:00:43', 'Hello!'),
(1, 'susan', 23, '1/1/11 3:00:53', 'What''s up?'),
(1, 'john', 24, '1/1/11 3:01:02', 'Not much'),
(1, 'susan', 25, '1/1/11 3:01:08', 'Cool')

Sample data:

select c.*, 'x', next.*
from chats c
left join chats next on next.log_id = c.log_id + 1 
order by c.log_id

Output:

 chat_id | user_name | log_id |     created_on      |    message    | ?column? | chat_id | user_name | log_id |     created_on      |    message    
---------+-----------+--------+---------------------+---------------+----------+---------+-----------+--------+---------------------+---------------
       1 | john      |     20 | 2011-01-01 03:00:00 | Hello         | x        |       1 | john      |     21 | 2011-01-01 03:00:23 | Anyone there?
       1 | john      |     21 | 2011-01-01 03:00:23 | Anyone there? | x        |       1 | susan     |     22 | 2011-01-01 03:00:43 | Hello!
       1 | susan     |     22 | 2011-01-01 03:00:43 | Hello!        | x        |       1 | susan     |     23 | 2011-01-01 03:00:53 | What's up?
       1 | susan     |     23 | 2011-01-01 03:00:53 | What's up?    | x        |       1 | john      |     24 | 2011-01-01 03:01:02 | Not much
       1 | john      |     24 | 2011-01-01 03:01:02 | Not much      | x        |       1 | susan     |     25 | 2011-01-01 03:01:08 | Cool
       1 | susan     |     25 | 2011-01-01 03:01:08 | Cool          | x        |         |           |        |                     | 

The grouping:

select c.*, 'x', next.*, count(case when next.user_name is null or next.user_name <> c.user_name then 1 end) over(order by c.log_id)
from chats c
left join chats next on next.log_id + 1 = c.log_id 
order by c.log_id

Output:

 chat_id | user_name | log_id |     created_on      |    message    | ?column? | chat_id | user_name | log_id |     created_on      |    message    | count 
---------+-----------+--------+---------------------+---------------+----------+---------+-----------+--------+---------------------+---------------+-------
       1 | john      |     20 | 2011-01-01 03:00:00 | Hello         | x        |         |           |        |                     |               |     1
       1 | john      |     21 | 2011-01-01 03:00:23 | Anyone there? | x        |       1 | john      |     20 | 2011-01-01 03:00:00 | Hello         |     1
       1 | susan     |     22 | 2011-01-01 03:00:43 | Hello!        | x        |       1 | john      |     21 | 2011-01-01 03:00:23 | Anyone there? |     2
       1 | susan     |     23 | 2011-01-01 03:00:53 | What's up?    | x        |       1 | susan     |     22 | 2011-01-01 03:00:43 | Hello!        |     2
       1 | john      |     24 | 2011-01-01 03:01:02 | Not much      | x        |       1 | susan     |     23 | 2011-01-01 03:00:53 | What's up?    |     3
       1 | susan     |     25 | 2011-01-01 03:01:08 | Cool          | x        |       1 | john      |     24 | 2011-01-01 03:01:02 | Not much      |     4
(6 rows)

The grouped result:

with grouped_result as
(
select c.log_id, c.user_name, count(case when next.user_name is null or next.user_name <> c.user_name then 1 end) over(order by c.log_id) as the_grouping
from chats c
left join chats next on next.log_id + 1 = c.log_id 
order by c.log_id
)
select user_name, max(log_id) as last_chat_of_each_user
from grouped_result
group by the_grouping
    ,user_name
order by last_chat_of_each_user

Output:

 user_name | last_chat_of_each_user 
-----------+------------------------
 john      |                     21
 susan     |                     23
 john      |                     24
 susan     |                     25
(4 rows)

Chat and responses:

with grouped_result as
(
select c.log_id, c.user_name, count(case when next.user_name is null or next.user_name <> c.user_name then 1 end) over(order by c.log_id) as the_grouping
from chats c
left join chats next on next.log_id + 1 = c.log_id 
order by c.log_id
),
last_chats as
(
select user_name as responded_to, max(log_id) as last_chat_of_each_user
from grouped_result
group by the_grouping
    ,responded_to
)
select lc.responded_to, lc.last_chat_of_each_user as responded_to_log_id, lc_the_chat.created_on as responded_to_timestamp, 'x',  answered_by.user_name as responded_by, answered_by.created_on as response_created_on
from last_chats lc
join chats lc_the_chat on lc_the_chat.log_id = lc.last_chat_of_each_user
join chats answered_by on answered_by.log_id = lc.last_chat_of_each_user + 1
order by lc.last_chat_of_each_user

Output:

 responded_to | responded_to_log_id | responded_to_timestamp | ?column? | responded_by | response_created_on 
--------------+---------------------+------------------------+----------+--------------+---------------------
 john         |                  21 | 2011-01-01 03:00:23    | x        | susan        | 2011-01-01 03:00:43
 susan        |                  23 | 2011-01-01 03:00:53    | x        | john         | 2011-01-01 03:01:02
 john         |                  24 | 2011-01-01 03:01:02    | x        | susan        | 2011-01-01 03:01:08
(3 rows)

Chat's response average time:

with grouped_result as
(
select c.log_id, c.user_name, count(case when next.user_name is null or next.user_name <> c.user_name then 1 end) over(order by c.log_id) as the_grouping
from chats c
left join chats next on next.log_id + 1 = c.log_id 
order by c.log_id
),
last_chats as
(
select user_name as responded_to, max(log_id) as last_chat_of_each_user
from grouped_result
group by the_grouping
    ,responded_to
),
responses as
(
select lc.responded_to, lc.last_chat_of_each_user as responded_to_log_id, lc_the_chat.created_on as responded_to_timestamp,  answered_by.user_name as responded_by, answered_by.created_on as response_created_on
from last_chats lc
join chats lc_the_chat on lc_the_chat.log_id = lc.last_chat_of_each_user
join chats answered_by on answered_by.log_id = lc.last_chat_of_each_user + 1
order by lc.last_chat_of_each_user
)
select responded_by, responded_to, sum(response_created_on - responded_to_timestamp), count(*), avg(response_created_on - responded_to_timestamp) as average_response_to_person
from responses
group by responded_by, responded_to

Output:

 responded_by | responded_to |   sum    | count | average_response_to_person 
--------------+--------------+----------+-------+----------------------------
 susan        | john         | 00:00:26 |     2 | 00:00:13
 john         | susan        | 00:00:09 |     1 | 00:00:09
(2 rows)

Will work out-of-the box on Postgresql. To make it work on Sql Server, just change the response_created_on - responded_to_timestamp to corresponding Sql Server DATEDIFF construct (i cannot recall off the top my head what's the DATEDIFF for seconds)


This will get the job done, but I'm not sure how it will scale:

select spoke, responded, count(*) responses, avg(time_diff) avg_seconds from (
select a.user_name spoke, b.user_name responded, a.created_on spoke_at, min(b.created_on) responded_at, datediff(ss, a.created_on, min(b.created_on)) time_diff
from chats a, chats b
where a.chat_id = b.chat_id
 and a.log_id < b.log_id
 and not exists (select 1 from chats c where c.chat_id = a.chat_id and c.log_id < b.log_id and c.log_id > a.log_id)
group by a.user_name, b.user_name, a.created_on
) users group by spoke, responded

 spoke     responded     responses     avg_seconds    
 --------  ------------  ------------  -------------- 
 john      john          1             23             
 susan     john          1             9              
 john      susan         2             13             
 susan     susan         1             10   

4 record(s) selected [Fetch MetaData: 0ms] [Fetch Data: 0ms]

It should be okay with an index on (chat_id, log_id).

If you'd like to eliminate same responses, all you need is a != in the outer where clause:

select spoke, responded, count(*) responses, avg(time_diff) avg_seconds from (
select a.user_name spoke, b.user_name responded, a.created_on spoke_at, min(b.created_on) responded_at, datediff(ss, a.created_on, min(b.created_on)) time_diff
from chats a, chats b
where a.chat_id = b.chat_id
 and a.log_id < b.log_id
 and not exists (select 1 from chats c where c.chat_id = a.chat_id and c.log_id < b.log_id and c.log_id > a.log_id)
group by a.user_name, b.user_name, a.created_on
) users 
where spoke != responded
group by spoke, responded
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜