vectorising the application of mle2 models
I have written a model that I am fitting to data using ML via the mle2 package. However, I have a large data frame of samples and I would like to fit the model to each replicate, and then retrieve all of the coefficients of the model in a data frame.
I have tried to use the ddply function in the plyr package with no success.
I get the following error message when I try:
Error in output[[var]][rng] <- df[[var]] :
incompatible types (from S4 to logical) in subassignment type fix
Any thoughts?
here is an example of what I am doing.
This is my data frame. I have measurement in Pond
5...n on day
1....n. Measurements consist of 143 fluxes (flux.cor
), which is the variable I am modelling.
Pond Obs Date Time Temp DO pH U day month PAR
932 5 932 2011-06-16 17:31:0开发者_StackOverflow0 17:31:00 294.05 334.3750 8.47 2 1 1 685.08
933 5 933 2011-06-16 17:41:00 17:41:00 294.05 339.0625 8.47 2 1 1 808.44
934 5 934 2011-06-16 17:51:00 17:51:00 294.02 340.6250 8.46 2 1 1 752.78
935 5 935 2011-06-16 18:01:00 18:01:00 294.00 340.6250 8.45 2 1 1 684.14
936 5 936 2011-06-16 18:11:00 18:11:00 293.94 340.9375 8.50 2 1 1 625.86
937 5 937 2011-06-16 18:21:00 18:21:00 293.88 341.5625 8.48 2 1 1 597.06
day.night Treat H pOH OH DO.cor sd.DO av.DO DO.sat
932 1 A 3.388442e-09 5.53 2.951209e-06 342.1406 2.63078 342.1406 274.0811
933 1 A 3.388442e-09 5.53 2.951209e-06 339.0625 2.63078 342.1406 274.0811
934 1 A 3.467369e-09 5.54 2.884032e-06 340.6250 2.63078 342.1406 274.2432
935 1 A 3.548134e-09 5.55 2.818383e-06 340.6250 2.63078 342.1406 274.3513
936 1 A 3.162278e-09 5.50 3.162278e-06 340.9375 2.63078 342.1406 274.6763
937 1 A 3.311311e-09 5.52 3.019952e-06 341.5625 2.63078 342.1406 275.0020
DO_flux NEP.hr flux.cor sd.flux av.flux
932 -3.078125 -3.09222602 -3.078125 2.104482 -0.1070312
933 1.562500 1.54903673 1.562500 2.104482 -0.1070312
934 0.000000 -0.01375489 0.000000 2.104482 -0.1070312
935 0.312500 0.29876654 0.312500 2.104482 -0.1070312
936 0.625000 0.61126617 0.625000 2.104482 -0.1070312
here is the my model:
# function that generates predictions of O2 flux given GPP R and gas exchange
flux.pred <- function(GPP24, PAR, R24, Temp, U, DO, DOsat){
# calculates Schmidt coefficient from water temperature
Sc<-function(Temp){
S<-0.0476*(Temp)^2 + 3.7818*(Temp)^2 - 120.1*Temp + 1800.6
}
# calculates piston velocity k (m h-1) from wind speed at 10m (m s-1)
k600<-function(U){
k.600<-(2.07 + 0.215*((U)^1.7))/100
}
# calculates piston velocity k (m h-1) from wind speed at 10m (m s-1)
k<-function(Temp,U){
k<-k600(U)*((Sc(Temp)/600)^-0.5)
}
# physical gas flux (mg O2 m-2 10mins-1)
D<-function(Temp,U,DO,DOsat){
d<-(k(Temp,U)/6)*(DO-DOsat)
}
# main function to generate predictions
flux<-(GPP24/sum(YSI$PAR[YSI$PAR>40]))*(ifelse(YSI$PAR>40, YSI$PAR, 0))-(R24/144)+D(YSI$Temp,YSI$U,YSI$DO,YSI$DO.sat)
return(flux)
}
which returns predictions for the fluxes.
I then build my likelihood function:
# likelihood function
ll<-function(GPP24, PAR, R24, Temp, U, DO.cor, DO.sat){
pred = (flux.pred(GPP24, PAR, R24, Temp, U, DO.cor, DOsat))
pred = pred[-144]
obs = YSI$flux.cor[-144]
return(-sum(dnorm(obs, mean=pred, sd=sqrt(var(obs-pred)))))
}
and apply it
ll.fit<-mle2(ll,start=list(GPP24=100, R24=100))
It works beautifully for one Pond on one day, but what I want to do is apply it to all ponds on all days automatically.
I tried the ddply (as stated above)
metabolism<-ddply(YSI, .(Pond,Treat,day,month), summarise,
mle = mle2(ll,start=list(GPP24=100, R24=100)))
but had not success. I also tried just extracting the coefficients using a for loop, but this did not work either.
for(i in 1:length(unique(YSI$day))){
GPP<-numeric(length=length(unique(YSI$day)))
GPP[i]<-mle2(ll,start=list(GPP24=100, R24=100))
}
any help would be gratefully received.
There's at least one problem with your functions : Nowhere in your function flux.pred or ll do you have an argument where you can actually specify what the data is that is used. You hardcoded it. So how on earth is any *ply supposed to guess that it needs to change YSI$...
into a subset?
Next to that, as @hadley points out, ddply will not suit you. dlply
might, or you might just use the classic approach of either by()
or lapply(split())
.
So imagine you make a function
flux.pred <- function(data, GPP24, R24){
# calculates Schmidt coefficient from water temperature
Sc<-function(data$Temp){
S<-0.0476*(data$Temp)^2 ...
...
}
and a function
ll<-function(GPP24, R24, data ){
pred = (flux.pred(data, GPP24, R24 ))
pred = pred[-144] # check this
obs = data$flux.cor[-144] # check this
return(-sum(dnorm(obs, mean=pred, sd=sqrt(var(obs-pred)))))
}
You should then be able to do do eg :
dlply(data, .(Pond,Treat,day,month), .fun=function(i){
mle2(ll,start=list(GPP24=100, R24=100, data=i))
})
The passing of the data argument is dependent on what you use in mle2 for the optimization. In your case, you use the default optimizer, which is optim
. See ?optim
for more details. The argument data=i
will be passed from mle2
to optim
to ll
.
What I can't check, is how the optim behaves. It might even be that your function doesn't really work as you intend. Normally you should have a function ll like :
ll <- function(par, data){
GPP24 <- par[1]
R24 <- par[2]
...
}
for optim to work. But if you say it works as you wrote it, I believe you. Make sure though it actually does. I am not convinced...
On a sidenote : neither using by()
/ lapply(split())
nor using dlply()
is the same as vectorizing. In contrary, all these constructs are intrinsic loops. On the why of using them, read : Is R's apply family more than syntactic sugar?
精彩评论