In R how to get the statistics on time difference of sessions
I have data that represents different sessions of users. This is in a format like
User StartTime EndTime
user1 1291043867 1291044055
user2 1290970409 1290972041
user3 1291019561 1291019562
user2 1290897232 1290897244
user1 1291100532 1291100559
user3 1291142492 1291142496
user2 1291128374 1291128391
user2 开发者_JS百科 1291032746 1291032748
...
Note that the timestamps are unix times.
I need to get the summary statistics like mean, percentiles on the number of sessions for each user. I also need to get the average time between each successive sessions for all the users. How do I go about doing this in R.
This should get you partially started.
sfac <- read.table(textConnection("User StartTime EndTime
user1 1291043867 1291044055
user2 1290970409 1290972041
user3 1291019561 1291019562
user2 1290897232 1290897244
user1 1291100532 1291100559
user3 1291142492 1291142496
user2 1291128374 1291128391
user2 1291032746 1291032748"), header = TRUE)
sfac$diff <- with(sfac, EndTime - StartTime) # add difference
sfac.split <- split(sfac, sfac$User)
#num of sessoins
lapply(sfac.split, nrow)
$user1
[1] 2
$user2
[1] 4
$user3
[1] 2
#mean
lapply(sfac.split, function(x) mean(x$diff))
$user1
[1] 107.5
$user2
[1] 415.75
$user3
[1] 2.5
You can probably get most things you want with ddply
and summarise
from plyr
:
foo <- data.frame(
User = paste("user",c(1:3,1:3,1:3),sep=""),
StartTime = as.numeric(Sys.time() + 1:9*10),
EndTime = as.numeric(Sys.time() + 1:9*10 + 2))
library(plyr)
ddply(foo,"User",summarise,
Nvisits = length(StartTime),
AvgTimePerSes = mean(EndTime - StartTime),
AvgTimeBetweenSes = mean(StartTime[-1] - StartTime[-length(StartTime)])
)
User Nvisits AvgTimePerSes AvgTimeBetweenSes
1 user1 3 2 30
2 user2 3 2 30
3 user3 3 2 30
Edit:
Using the dataframe from Roman's answer:
foo <- read.table(textConnection("User StartTime EndTime
user1 1291043867 1291044055
user2 1290970409 1290972041
user3 1291019561 1291019562
user2 1290897232 1290897244
user1 1291100532 1291100559
user3 1291142492 1291142496
user2 1291128374 1291128391
user2 1291032746 1291032748"), header = TRUE)
library(plyr)
ddply(foo,"User",summarise,
Nvisits = length(StartTime),
AvgTime = mean(EndTime - StartTime),
AvgBetweenSes = mean(StartTime[-1] - StartTime[-length(StartTime)])
)
User Nvisits AvgTime AvgBetweenSes
1 user1 2 107.50 56665
2 user2 4 415.75 20779
3 user3 2 2.50 122931
精彩评论