开发者

In R how to get the statistics on time difference of sessions

I have data that represents different sessions of users. This is in a format like

User       StartTime        EndTime
user1      1291043867      1291044055
user2      1290970409      1290972041
user3      1291019561      1291019562
user2      1290897232      1290897244
user1      1291100532      1291100559
user3      1291142492      1291142496
user2      1291128374      1291128391
user2    开发者_JS百科  1291032746      1291032748
...

Note that the timestamps are unix times.

I need to get the summary statistics like mean, percentiles on the number of sessions for each user. I also need to get the average time between each successive sessions for all the users. How do I go about doing this in R.


This should get you partially started.

sfac <- read.table(textConnection("User       StartTime        EndTime
user1      1291043867      1291044055
user2      1290970409      1290972041
user3      1291019561      1291019562
user2      1290897232      1290897244
user1      1291100532      1291100559
user3      1291142492      1291142496
user2      1291128374      1291128391
user2      1291032746      1291032748"), header = TRUE)

sfac$diff <- with(sfac, EndTime - StartTime) # add difference
sfac.split <- split(sfac, sfac$User)

#num of sessoins
lapply(sfac.split, nrow)

$user1
[1] 2

$user2
[1] 4

$user3
[1] 2

#mean
lapply(sfac.split, function(x) mean(x$diff))

$user1
[1] 107.5

$user2
[1] 415.75

$user3
[1] 2.5


You can probably get most things you want with ddply and summarise from plyr:

foo <- data.frame(
User = paste("user",c(1:3,1:3,1:3),sep=""), 
StartTime = as.numeric(Sys.time() + 1:9*10), 
EndTime = as.numeric(Sys.time() + 1:9*10 + 2))

library(plyr)

ddply(foo,"User",summarise,
Nvisits = length(StartTime),
AvgTimePerSes = mean(EndTime - StartTime),
AvgTimeBetweenSes = mean(StartTime[-1] - StartTime[-length(StartTime)])
)
  User Nvisits AvgTimePerSes AvgTimeBetweenSes
1 user1       3       2            30
2 user2       3       2            30
3 user3       3       2            30

Edit:

Using the dataframe from Roman's answer:

foo <- read.table(textConnection("User       StartTime        EndTime
user1      1291043867      1291044055
user2      1290970409      1290972041
user3      1291019561      1291019562
user2      1290897232      1290897244
user1      1291100532      1291100559
user3      1291142492      1291142496
user2      1291128374      1291128391
user2      1291032746      1291032748"), header = TRUE)


library(plyr)

ddply(foo,"User",summarise,
    Nvisits = length(StartTime),
    AvgTime = mean(EndTime - StartTime),
    AvgBetweenSes = mean(StartTime[-1] - StartTime[-length(StartTime)]) 
)
   User Nvisits AvgTime AvgBetweenSes
1 user1       2  107.50         56665
2 user2       4  415.75         20779
3 user3       2    2.50        122931
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜