开发者

Only Selecting Cases for all Time Periods

I have a longitudinal data set for a month in which there is some user attrition.

I'd like to subset the data just for those users who are active across all 30 days, but I could not find an example of this type of subset. Here is an exam开发者_如何学编程ple of the data layout:

date          userID       x
2001-11-08    1            20
2001-11-08    2            2
2001-11-08    3            10
2001-11-08    4            5
2001-11-08    5            1
2001-11-09    1            19
2001-11-09    3            4
2001-11-09    4            5
...
2001-11-30    1            15


subset(dnow, ave(as.numeric(date), userID, FUN=function(x) length(unique(x)))==30)


You should consider using the data processing tools in the plyr library.

library(plyr)

startdate <- ISOdate(2011, 1, 1)
userdata <- data.frame(
        date = startdate + rep(1:31, each=3),
        userID = 1 + round(9*runif(93)),
        x = round(100*runif(93))
)

summary <- ddply(userdata, .(userID), summarize, activedays=length(date))
summary[summary$activedays >= 30, ]

You can find out more about plyr at Hadley's excellent website: http://had.co.nz/plyr/


I would use ave to determine the number of days each user was active per month.

Data$activeDays <- ave(Data$userID, Data$userID, FUN=length)
Data[ Data$activeDays >= 30, ]

It would be a bit more tricky if your data set contains multiple months...


which(tapply(userdata$date, userdata$userID, length) == 30)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜