开发者

How do you apply a function to a nested list?

I need to get the maximum of a variable in a nested list. For a certain station number "s" and a certain member "m", mylist[[s]][[m]] are of the form:

station date.time        member  bias
6019    2011-08-06 12:00 mbr003  86
6019    2011-08-06 13:00 mbr003  34

For each station, I need to get the maximum of bias of all members. For s = 3, I managed to do it through:

library(plyr)
var1 <- mylist[[3]]
var2 &l开发者_运维问答t;- lapply(var1, `[`, 4)
var3 <- laply(var2, .fun = max)
max.value <- max(var3)

Is there a way of avoiding the column number "4" in the second line and using the variable name $bias in lapply or a better way of doing it?


You can use [ with the names of columns of data frames as well as their index. So foo[4] will have the same result as foo["bias"] (assuming that bias is the name of the fourth column).

$bias isn't really the name of that column. $ is just another function in R, like [, that is used for accessing columns of data frames (among other things).

But now I'm going to go out on a limb and offer some advice on your data structure. If each element of your nested list contains the data for a unique combination of station and member, here is a simplified toy version of your data:

dat <- expand.grid(station = rep(1:3,each = 2),member = rep(1:3,each = 2))
dat$bias <- sample(50:100,36,replace = TRUE)

tmp <- split(dat,dat$station)
tmp <- lapply(tmp,function(x){split(x,x$member)})

> tmp
$`1`
$`1`$`1`
  station member bias
1       1      1   87
2       1      1   82
7       1      1   51
8       1      1   60

$`1`$`2`
   station member bias
13       1      2   64
14       1      2  100
19       1      2   68
20       1      2   74
etc.

tmp is a list of length three, where each element is itself a list of length three. Each element is a data frame as shown above.

It's really much easier to record this kind of data as a single data frame. You'll notice I constructed it that way first (dat) and then split it twice. In this case you can rbind it all together again using code like this:

newDat <- do.call(rbind,lapply(tmp,function(x){do.call(rbind,x)}))
rownames(newDat) <- NULL

In this form, these sorts of calculations are much easier:

library(plyr)
#Find the max bias for each unique station+member
ddply(newDat,.(station,member),summarise, mx = max(bias))
  station member  mx
1       1      1  87
2       1      2 100
3       1      3  91
4       2      1  94
5       2      2  88
6       2      3  89
7       3      1  74
8       3      2  88
9       3      3  99

#Or maybe the max bias for each station across all members
ddply(newDat,.(station),summarise, mx = max(bias))
  station  mx
1       1 100
2       2  94
3       3  99


Here is another solution using repeated lapply.

lapply(tmp, function(x) lapply(lapply(x, '[[', 'bias'), max))


You may need to use [[ instead of [, but it should work fine with a string (don't use the $). try:

var2 <- lapply( var1, `[`, 'bias' )

or

var2 <- lapply( var1, `[[`, 'bias' )

depending on if var1 is a list.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜