How to extract (or subset) all the rows with a condition on a column in a dataframe?
I have the following dataframe and wish to extract all rows corresponding to the same group with status==1.
The status column is either 0 or 1.
df<-data.f开发者_StackOverflow中文版rame(time= rep(1:4,times=c(2,3,5,4)),status=c(0,0,1,1,0,0,0,0,0,0,1,0,0,0))
Input Data
time status
1 1 0
2 1 0
3 2 1
4 2 1
5 2 0
6 3 0
7 3 0
8 3 0
9 3 0
10 3 0
11 4 1
12 4 0
13 4 0
14 4 0
Desired output (with renumbering the group column in sequence).
time status
1 1
1 1
1 0
2 1
2 0
2 0
2 0
The dimension of my actual data.frame is in order of 10^6 by 5.
Thank you for your help.
Hm, so you want to get the group two and four since both these groups have a status one value, correct? And from those two groups you like to get the whole output?
If so,how about this:
df <- data.frame(time = rep(1:4, times = c(2,3,5,4)),
status = c(0,0,1,1,0,0,0,0,0,0,1,0,0,0))
id <- unique(df[df$status == 1, "time"])
df2 <- df[df$time %in% id, ]
edit:
df2$time <- factor(df2$time, labels = c(1,2))
I'm a bit confused as you're trying to do two separate things - first wanting to extract rows with status == 1, whereas in your desired output, there are rows with zeros. This is one of the ways how you can extract rows with status == 1.
df1 <- data.frame(time= rep(1:4,times=c(2,3,5,4)),status=c(0,0,1,1,0,0,0,0,0,0,1,0,0,0))
df1$time <- factor(df1$time)
df1s <- split(df1, df1$time)
df1l <- lapply(df1s, function(x) {
x[x$status == 1, ]
})
df1l <- do.call("rbind", df1l)
or the other way around
df2 <- df1[df1$status == 1, ]
df2.l <- split(df2, as.factor(df2$time))
精彩评论