Transform a data.frame, while filling missing values

2023-04-08 01:20 问答作者：

I have the data frame

data<-data.frame(id=开发者_开发问答c("A","A","B","B"), day=c(5,6,1,2), duration=c(12,1440,5,6), obs.period=c(60, 60,100,100))

showing Subject ID, day of event, duration of event, and observation period of Subject

I want to transform the data set to that it will show the whole observation period for each subject (all days of observation), while adding zero as duration values for the days where no event was observed

For the above dataset this would be something like this:

id  day duration    obs.period
A   1   0   60
A   2   0   60
A   3   0   60
A   4   0   60
A   5   12  60
A   6   1440    60
A   7   0   60
A   8   0   60
    .       
    .       
    .       
A   60  0   60
B   1   5   100
B   2   6   100
B   3   0   100
B   4   0   100
    .       
    .       
    .       
    .       
B   100 0   100

Any ideas?

Here's one approach using the plyr package. First, create a function to expand the data into the appropriate number of rows. Then, index into that new data.frame with the duration info from the original data. Finally, call this function with ddply() and group on the id variable.

require(plyr)
FUN <- function(x){
  dat <- data.frame(
    id = x[1,1]
    , day = seq_len(x[1,4])
    , duration = 0
    , obs.period = x[1,4]
    )

  dat[dat$id == x$id & dat$day == x$day, "duration"] <- x$duration
  return(dat)
}


ddply(data, "id", FUN)

    id day duration obs.period
1    A   1        0         60
2    A   2        0         60
3    A   3        0         60
4    A   4        0         60
5    A   5       12         60
6    A   6     1440         60
...
61   B   1        5        100
62   B   2        6        100
63   B   3        0        100
...
160  B 100        0        100

Create an empty data frame with the proper index columns, but no value columns, then merge it with your data and replace the NA's in the value columns with zeros.

data<-data.frame(id=c("A","A","B","B"), day=c(5,6,1,2), duration=c(12,1440,5,6), obs.period=c(60, 60,100,100))
zilch=data.frame(id=rep(c("A","B"),each=60),day=1:60)
all=merge(zilch,data, all=T)
all[is.na(all$duration),"duration"]<-0
all[is.na(all$obs.period),"obs.period"]<-0

I would first create a data frame to contain the results.

ob.period <- with(data, tapply(obs.period, id, max))

n <- sum(ob.period)
result <- data.frame(id=rep(names(ob.period), ob.period),
                     day=unlist(lapply(ob.period, function(a) 1:a)),
                     duration=rep(0, n),
                     obs.period=rep(ob.period,ob.period))

Then I would paste id and day together, use match to find the relevant rows in the larger data frame, and plug in the duration values.

idday.sm <- paste(data$id, data$day, sep=":")
idday.lg <- paste(result$id, result$day, sep=":")

result$duration[match(idday.sm, idday.lg)] <- data$duration

Here is an approach with plyr

fill1 <- function(df) {
  full_period <- 1:100
  to_fill <- setdiff(full_period, df$day)
  fill_id <- df[1,"id"]
  fill_dur <- 0
  fill_obs.p <- df[1,"obs.period"]
  rows_to_add <- data.frame(id=fill_id, day=to_fill, duration=fill_dur, obs.period=fill_obs.p)
  rbind(df,rows_to_add)
}
ddply(data, "id", fill1)

The result is not sorted by id, duration, however.

Transform a data.frame, while filling missing values

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？