开发者

vectorising the application of mle2 models

I have written a model that I am fitting to data using ML via the mle2 package. However, I have a large data frame of samples and I would like to fit the model to each replicate, and then retrieve all of the coefficients of the model in a data frame.

I have tried to use the ddply function in the plyr package with no success.

I get the following error message when I try:

Error in output[[var]][rng] <- df[[var]] : 
  incompatible types (from S4 to logical) in subassignment type fix

Any thoughts?

here is an example of what I am doing.

This is my data frame. I have measurement in Pond 5...n on day 1....n. Measurements consist of 143 fluxes (flux.cor), which is the variable I am modelling.

     Pond Obs                Date     Time   Temp       DO   pH U day month    PAR
932    5 932 2011-06-16 17:31:0开发者_StackOverflow0 17:31:00 294.05 334.3750 8.47 2   1     1 685.08
933    5 933 2011-06-16 17:41:00 17:41:00 294.05 339.0625 8.47 2   1     1 808.44
934    5 934 2011-06-16 17:51:00 17:51:00 294.02 340.6250 8.46 2   1     1 752.78
935    5 935 2011-06-16 18:01:00 18:01:00 294.00 340.6250 8.45 2   1     1 684.14
936    5 936 2011-06-16 18:11:00 18:11:00 293.94 340.9375 8.50 2   1     1 625.86
937    5 937 2011-06-16 18:21:00 18:21:00 293.88 341.5625 8.48 2   1     1 597.06
    day.night Treat            H  pOH           OH   DO.cor   sd.DO    av.DO   DO.sat
932         1     A 3.388442e-09 5.53 2.951209e-06 342.1406 2.63078 342.1406 274.0811
933         1     A 3.388442e-09 5.53 2.951209e-06 339.0625 2.63078 342.1406 274.0811
934         1     A 3.467369e-09 5.54 2.884032e-06 340.6250 2.63078 342.1406 274.2432
935         1     A 3.548134e-09 5.55 2.818383e-06 340.6250 2.63078 342.1406 274.3513
936         1     A 3.162278e-09 5.50 3.162278e-06 340.9375 2.63078 342.1406 274.6763
937         1     A 3.311311e-09 5.52 3.019952e-06 341.5625 2.63078 342.1406 275.0020
      DO_flux      NEP.hr  flux.cor  sd.flux    av.flux
932 -3.078125 -3.09222602 -3.078125 2.104482 -0.1070312
933  1.562500  1.54903673  1.562500 2.104482 -0.1070312
934  0.000000 -0.01375489  0.000000 2.104482 -0.1070312
935  0.312500  0.29876654  0.312500 2.104482 -0.1070312
936  0.625000  0.61126617  0.625000 2.104482 -0.1070312

here is the my model:

    # function that generates predictions of O2 flux given GPP R and gas exchange
flux.pred <- function(GPP24, PAR, R24, Temp, U, DO, DOsat){
    # calculates Schmidt coefficient from water temperature
    Sc<-function(Temp){
        S<-0.0476*(Temp)^2 + 3.7818*(Temp)^2 - 120.1*Temp + 1800.6
        }
    # calculates piston velocity k (m h-1) from wind speed at 10m (m s-1)
    k600<-function(U){
        k.600<-(2.07 + 0.215*((U)^1.7))/100 
        }
    # calculates piston velocity k (m h-1) from wind speed at 10m (m s-1)
    k<-function(Temp,U){
        k<-k600(U)*((Sc(Temp)/600)^-0.5)
        }
    # physical gas flux (mg O2 m-2 10mins-1)
    D<-function(Temp,U,DO,DOsat){
        d<-(k(Temp,U)/6)*(DO-DOsat)
    }   

  # main function to generate predictions   
flux<-(GPP24/sum(YSI$PAR[YSI$PAR>40]))*(ifelse(YSI$PAR>40, YSI$PAR, 0))-(R24/144)+D(YSI$Temp,YSI$U,YSI$DO,YSI$DO.sat)
return(flux)
}

which returns predictions for the fluxes.

I then build my likelihood function:

   # likelihood function
ll<-function(GPP24, PAR, R24, Temp, U, DO.cor, DO.sat){
    pred = (flux.pred(GPP24, PAR, R24, Temp, U, DO.cor, DOsat))
    pred = pred[-144]
    obs = YSI$flux.cor[-144]
    return(-sum(dnorm(obs, mean=pred, sd=sqrt(var(obs-pred)))))
    } 

and apply it

ll.fit<-mle2(ll,start=list(GPP24=100, R24=100))

It works beautifully for one Pond on one day, but what I want to do is apply it to all ponds on all days automatically.

I tried the ddply (as stated above)

metabolism<-ddply(YSI, .(Pond,Treat,day,month), summarise,
mle = mle2(ll,start=list(GPP24=100, R24=100)))

but had not success. I also tried just extracting the coefficients using a for loop, but this did not work either.

for(i in 1:length(unique(YSI$day))){
GPP<-numeric(length=length(unique(YSI$day)))
GPP[i]<-mle2(ll,start=list(GPP24=100, R24=100))
    }

any help would be gratefully received.


There's at least one problem with your functions : Nowhere in your function flux.pred or ll do you have an argument where you can actually specify what the data is that is used. You hardcoded it. So how on earth is any *ply supposed to guess that it needs to change YSI$... into a subset?

Next to that, as @hadley points out, ddply will not suit you. dlply might, or you might just use the classic approach of either by() or lapply(split()).

So imagine you make a function

flux.pred <- function(data, GPP24, R24){
    # calculates Schmidt coefficient from water temperature
    Sc<-function(data$Temp){
        S<-0.0476*(data$Temp)^2 ...
    ...
    }   

and a function

ll<-function(GPP24, R24, data ){
    pred = (flux.pred(data, GPP24, R24 ))
    pred = pred[-144] # check this
    obs = data$flux.cor[-144] # check this
    return(-sum(dnorm(obs, mean=pred, sd=sqrt(var(obs-pred)))))
    } 

You should then be able to do do eg :

dlply(data, .(Pond,Treat,day,month), .fun=function(i){
    mle2(ll,start=list(GPP24=100, R24=100, data=i))
})

The passing of the data argument is dependent on what you use in mle2 for the optimization. In your case, you use the default optimizer, which is optim. See ?optim for more details. The argument data=i will be passed from mle2 to optim to ll.

What I can't check, is how the optim behaves. It might even be that your function doesn't really work as you intend. Normally you should have a function ll like :

ll <- function(par, data){
    GPP24 <- par[1]
    R24 <- par[2]
    ...
}

for optim to work. But if you say it works as you wrote it, I believe you. Make sure though it actually does. I am not convinced...

On a sidenote : neither using by() / lapply(split()) nor using dlply() is the same as vectorizing. In contrary, all these constructs are intrinsic loops. On the why of using them, read : Is R's apply family more than syntactic sugar?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜