
How to extract (or subset) all the rows with a condition on a column in a dataframe?

I have the following dataframe and wish to extract all rows corresponding to the same group with status==1.

The status column is either 0 or 1.

df<-data.f开发者_StackOverflow中文版rame(time= rep(1:4,times=c(2,3,5,4)),status=c(0,0,1,1,0,0,0,0,0,0,1,0,0,0))

  Input Data 

   time status

1     1      0
2     1      0
3     2      1
4     2      1
5     2      0
6     3      0
7     3      0
8     3      0
9     3      0
10    3      0
11    4      1
12    4      0
13    4      0
14    4      0

Desired output (with renumbering the group column in sequence).

time status

   1      1
   1      1
   1      0
   2      1
   2      0
   2      0
   2      0

The dimension of my actual data.frame is in order of 10^6 by 5.

Thank you for your help.

Hm, so you want to get the group two and four since both these groups have a status one value, correct? And from those two groups you like to get the whole output?

If so,how about this:

df <- data.frame(time = rep(1:4, times = c(2,3,5,4)),
                 status = c(0,0,1,1,0,0,0,0,0,0,1,0,0,0))

id <- unique(df[df$status == 1, "time"])
df2 <- df[df$time %in% id, ]


df2$time <- factor(df2$time, labels = c(1,2))

I'm a bit confused as you're trying to do two separate things - first wanting to extract rows with status == 1, whereas in your desired output, there are rows with zeros. This is one of the ways how you can extract rows with status == 1.

df1 <- data.frame(time= rep(1:4,times=c(2,3,5,4)),status=c(0,0,1,1,0,0,0,0,0,0,1,0,0,0))
df1$time <- factor(df1$time)
df1s <- split(df1, df1$time)
df1l <- lapply(df1s, function(x) {
            x[x$status == 1, ]
df1l <- do.call("rbind", df1l)

or the other way around

df2 <- df1[df1$status == 1, ]
df2.l <- split(df2, as.factor(df2$time))




验证码 换一张
取 消

