R: make pls calibration models from n number of subset and use them to predict different test sets

2023-03-28 22:00 问答作者：

I am trying to apply a function I wrote that uses the 'pls' package to make a开发者_如何学Go model and then use it to predict several test set(in this case 9), returning the R2,RMSEP and prediction bias of each test set for n number of subset selected from the data frame. the function is

cpo<-function(data,newdata1,newdata2,newdata3,newdata4,newdata5,newdata6,newdata7,newdata8,newdata9){
              data.pls<-plsr(protein~.,8,data=data,validation="LOO")#making a pls model
              newdata1.pred<-predict(data.pls,8,newdata=newdata1)   #using the model to predict test sets
              newdata2.pred<-predict(data.pls,8,newdata=newdata2)
              newdata3.pred<-predict(data.pls,8,newdata=newdata3)
              newdata4.pred<-predict(data.pls,8,newdata=newdata4)
              newdata5.pred<-predict(data.pls,8,newdata=newdata5)
              newdata6.pred<-predict(data.pls,8,newdata=newdata6)
              newdata7.pred<-predict(data.pls,8,newdata=newdata7)
              newdata8.pred<-predict(data.pls,8,newdata=newdata8)
              newdata9.pred<-predict(data.pls,8,newdata=newdata9)
              pred.bias1<-mean(newdata1.pred-newdata1[742])         #calculating the prediction bias
              pred.bias2<-mean(newdata2.pred-newdata2[742])
              pred.bias3<-mean(newdata3.pred-newdata3[742])        #[742] reference values in column742
              pred.bias4<-mean(newdata4.pred-newdata4[742])
              pred.bias5<-mean(newdata5.pred-newdata5[742])
              pred.bias6<-mean(newdata6.pred-newdata6[742])
              pred.bias7<-mean(newdata7.pred-newdata7[742])
              pred.bias8<-mean(newdata8.pred-newdata8[742])
              pred.bias9<-mean(newdata9.pred-newdata9[742])
            r<-c(R2(data.pls,"train"),RMSEP(data.pls,"train"),pred.bias1,
                 pred.bias2,pred.bias3,pred.bias4,pred.bias5,pred.bias6,
                 pred.bias7,pred.bias8,pred.bias9)
          return(r)
}

selecting n number of subsets (based on an answer from my question[1]: Select several subsets by taking different row interval and appy function to all subsets and applying cpo function to each subset I tried

Edited based on @Gavin advice

FO03 <- function(data, nSubsets, nSkip){
  outList <- vector("list", 11)
  names(outList) <- c("R2train","RMSEPtrain", paste("bias", 1:9, sep = ""))
  sub <- vector("list", length = nSubsets)  # sub is the n number subsets created by selecting rows
  names(sub) <- c( paste("sub", 1:nSubsets, sep = ""))

 totRow <- nrow(data)

  for (i in seq_len(nSubsets)) {
    rowsToGrab <- seq(i, totRow, nSkip)
      sub[[i]] <- data[rowsToGrab ,] 
  }                                                           


for(i in sub) {                                         #for every subset in sub i want to apply cpo
    outList[[i]] <- cpo(data=sub,newdata1=gag11p,newdata2=gag12p,newdata3=gag13p,  
       newdata4=gag21p,newdata5=gag22p,newdata6=gag23p,                   
       newdata7=gag31p,newdata8=gag32p,newdata9=gag33p) #new data are test sets loaded in the workspace
      }
    return(outlist)
 }

FOO3(GAGp,10,10)

when I try this I keep getting 'Error in eval(expr, envir, enclos) : object 'protein' not found' not found. Protein is used in the plsr formula of cpo, and is in the data set. I then tried to use the plsr function directly as seen below

FOO4 <- function(data, nSubsets, nSkip){
outList <- vector("list", 11)
  names(outList) <- c("R2train","RMSEPtrain", paste("bias", 1:9, sep = ""))
  sub <- vector("list", length = nSubsets)
  names(sub) <- c( paste("sub", 1:nSubsets, sep = ""))

  totRow <- nrow(data)

  for (i in seq_len(nSubsets)) {
    rowsToGrab <- seq(i, totRow, nSkip)
      sub[[i]] <- data[rowsToGrab ,] 
  }

  cal<-vector("list", length=nSubsets)  #for each subset in sub make a pls model for protein
  names(cal)<-c(paste("cal",1:nSubsets, sep=""))
  for(i in sub) {
       cal[[i]] <- plsr(protein~.,8,data=sub,validation="LOO")
       }
    return(outlist) # return is just used to end script and check if error still occurs
 }
FOO4(gagpm,10,10)

When I tried this I get the same error 'Error in eval(expr, envir, enclos) : object 'protein' not found'. Any advice on how to deal with this and make the function work will be much appreciated.

I suspect the problem is immediately at the start of FOO3():

FOO3 <- function(data, nSubsets, nSkip) {
 outList <- vector("list", r <- c(R2(data.pls,"train"), RMSEP(data.pls,"train"), 
                   pred.bias1, pred.bias2, pred.bias3, pred.bias4, pred.bias5,
                   pred.bias6, pred.bias7, pred.bias8, pred.bias9))

Not sure what you are trying to do when creating outList, but vector() has two arguments and you seem to be assigning to r a vector of numerics that you want R to use as the length argument to vector().

Here you are using the object data.pls and this doesn't exist yet - and never will in the frame of FOO3() - it is only ever created in cpo().

Your second loop looks totally wrong - you are not assigning the output from cpo() to anything. I suspect you wanted:

outList <- vector("list", 11)
names(outList) <- c("R2train","RMSEPtrain", paste("bias", 1:9, sep = ""))
....
for(i in subset) {
    outList[[i]] <- cpo(....)
}
return(outList)

But that depends on what subset is etc. You also haven't got the syntax for this loop right. You have

for(i in(subset)) {

when it should be

for(i in subset) {

And subset and data aren't great names as these are common R functions and modelling arguments.

There are lots of problems with your code. Try to start simple and build up from there.

I have managed to achieved what i wanted using this, if there is a better way of doing it (i'm sure there must be) I'm eager to learn.This function preforms the following task
1. select "n" number of subsets from a dataframe
2. For each subset created, a plsr model is made
3. Each plsr model is used to predict 9 test sets
4. For each prediction, the prediction bias is calculated

far5<- function(data, nSubsets, nSkip){
   sub <- vector("list", length = nSubsets)
   names(sub) <- c( paste("sub", 1:nSubsets, sep = ""))                   
   totRow <- nrow(data)
   for (i in seq_len(nSubsets)) {
     rowsToGrab <- seq(i, totRow, nSkip)
       sub[[i]] <- data[rowsToGrab ,]}       #sub is the subsets created
  mop<- lapply(sub,cpr2)                     #assigning output from cpr to mop
   names(mop)<-c(paste("mop", mop, sep="")) 
  return(names(mop))
 }
call:  far5(data,nSubsets, nSkip))

The first part -selecting the subsets is based on the answer to my question Select several subsets by taking different row interval and appy function to all subsets I was then able to apply the function cpr2 to the subsets created using "lapply" instead of the "for' loop as was previously done. cpr2 is a modification of cpo, for which only data is supplied, and the new data to be predicted is used directly in the function as shown below.

cpr2<-function(data){ 
  data.pls<-plsr(protein~.,8,data=data,validation="LOO") #make plsr model       
  gag11p.pred<-predict(data.pls,8,newdata=gag11p)  #predict each test set 
  gag12p.pred<-predict(data.pls,8,newdata=gag12p)
  gag13p.pred<-predict(data.pls,8,newdata=gag13p)
  gag21p.pred<-predict(data.pls,8,newdata=gag21p)
  gag22p.pred<-predict(data.pls,8,newdata=gag22p)            
  gag23p.pred<-predict(data.pls,8,newdata=gag23p)
  gag31p.pred<-predict(data.pls,8,newdata=gag31p)
  gag32p.pred<-predict(data.pls,8,newdata=gag32p)
  gag33p.pred<-predict(data.pls,8,newdata=gag33p)                        
  pred.bias1<-mean(gag11p.pred-gag11p[742])     #calculate prediction bias      
  pred.bias2<-mean(gag12p.pred-gag12p[742])
  pred.bias3<-mean(gag13p.pred-gag13p[742])         
  pred.bias4<-mean(gag21p.pred-gag21p[742])
  pred.bias5<-mean(gag22p.pred-gag22p[742])
  pred.bias6<-mean(gag23p.pred-gag23p[742])
  pred.bias7<-mean(gag31p.pred-gag31p[742])
  pred.bias8<-mean(gag32p.pred-gag32p[742])
  pred.bias9<-mean(gag33p.pred-gag33p[742])            
r<-signif(c(pred.bias1,pred.bias2,pred.bias3,pred.bias4,pred.bias5,
      pred.bias6,pred.bias7,pred.bias8,pred.bias9),2)            
  out<-c(R2(data.pls,"train",ncomp=8),RMSEP(data.pls,"train",ncomp=8),r)
 return(out)          
}                 #signif use to return 2 decimal place for prediction bias

call:cpr2(data)

I was able to use this to solve my problem, however since the amount of new data to be predicted was only nine, it was possible to list them out as i did. If there is a more generalized way to do this I'm interested in learning.

继续阅读：calibration prediction

R: make pls calibration models from n number of subset and use them to predict different test sets

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？