How do you apply a function to a nested list?
I need to get the maximum of a variable in a nested list. For a certain station number "s" and a certain member "m", mylist[[s]][[m]]
are of the form:
station date.time member bias
6019 2011-08-06 12:00 mbr003 86
6019 2011-08-06 13:00 mbr003 34
For each station, I need to get the maximum of bias
of all members. For s = 3
, I managed to do it through:
library(plyr)
var1 <- mylist[[3]]
var2 &l开发者_运维问答t;- lapply(var1, `[`, 4)
var3 <- laply(var2, .fun = max)
max.value <- max(var3)
Is there a way of avoiding the column number "4" in the second line and using the variable name $bias
in lapply
or a better way of doing it?
You can use [
with the names of columns of data frames as well as their index. So foo[4]
will have the same result as foo["bias"]
(assuming that bias
is the name of the fourth column).
$bias
isn't really the name of that column. $
is just another function in R, like [
, that is used for accessing columns of data frames (among other things).
But now I'm going to go out on a limb and offer some advice on your data structure. If each element of your nested list contains the data for a unique combination of station
and member
, here is a simplified toy version of your data:
dat <- expand.grid(station = rep(1:3,each = 2),member = rep(1:3,each = 2))
dat$bias <- sample(50:100,36,replace = TRUE)
tmp <- split(dat,dat$station)
tmp <- lapply(tmp,function(x){split(x,x$member)})
> tmp
$`1`
$`1`$`1`
station member bias
1 1 1 87
2 1 1 82
7 1 1 51
8 1 1 60
$`1`$`2`
station member bias
13 1 2 64
14 1 2 100
19 1 2 68
20 1 2 74
etc.
tmp
is a list of length three, where each element is itself a list of length three. Each element is a data frame as shown above.
It's really much easier to record this kind of data as a single data frame. You'll notice I constructed it that way first (dat
) and then split it twice. In this case you can rbind
it all together again using code like this:
newDat <- do.call(rbind,lapply(tmp,function(x){do.call(rbind,x)}))
rownames(newDat) <- NULL
In this form, these sorts of calculations are much easier:
library(plyr)
#Find the max bias for each unique station+member
ddply(newDat,.(station,member),summarise, mx = max(bias))
station member mx
1 1 1 87
2 1 2 100
3 1 3 91
4 2 1 94
5 2 2 88
6 2 3 89
7 3 1 74
8 3 2 88
9 3 3 99
#Or maybe the max bias for each station across all members
ddply(newDat,.(station),summarise, mx = max(bias))
station mx
1 1 100
2 2 94
3 3 99
Here is another solution using repeated lapply
.
lapply(tmp, function(x) lapply(lapply(x, '[[', 'bias'), max))
You may need to use [[
instead of [
, but it should work fine with a string (don't use the $
). try:
var2 <- lapply( var1, `[`, 'bias' )
or
var2 <- lapply( var1, `[[`, 'bias' )
depending on if var1 is a list.
精彩评论