avoidind a loop in R

2023-01-09 01:25 问答作者：

I will like to avoid a loop in the following code:

delta_S <- function(Ro, Rr, Ir, S1, S2, S3, S4, chromaty) { .... etc .....}

for (i in 1:nrow(Rrecon)) {
    gri[i, 6] <- delta_S(Ro=as.vector(Rrecon[i, ]), Rr=data_base$bck, Ir=data_base$Ir, S1=data_base$s1, S2=data_base$s2, S3=data_base$s3, S4=data_base$s4, chromaty="tetra")
}

My problem is that in my dataset, i varies from 1 to more than a million. I guess by using something like apply I could save time compared to the 19h currently required.

Thank you for any reply!

Update:

I have read some info about vectorization but it seems largely out of my skills (and I do not even speak about using C ou FORTRAN: I have learned R for three months only, to resolve this study). So by chance, if somebody has some time to look at in details my code and to propose a vectorization if it is possible, it would be hugely appreciated! Thank you Julien PS I copy the entire code because I do not know where it can be vectorize!

data<-read.table("databird.txt",sep="\t",dec=".",header=T,row.names=1)

#perform a PCA
pca<-dudi.pca(data, center = FALSE, scale = FALSE, scannf = FALSE, nf = ncol(data))

#definition of some functions
Q <- function(R,Ir,Si){
  temp <- R/100*Ir*Si
  return(sum((temp[2:length(temp)]+temp[1:(length(temp)-1)])/2))
  }

delta_f <- function(Ro,Rr,Ir,Si) {
  Qo <- Q(Ro,Ir,Si)
  Qr <- Q(Rr,Ir,Si)
  #if(Qo/Qr <0) {print("bug: Qo/Qr < 0, bug log neg"); return(NaN)}
  return(log(Qo/Qr))
  }

delta_S <- function(Ro,Rr,Ir,S1,S2,S3,S4,chromaty){
  if(chromaty=="tetra"){
    e1 <- 0.1
    e2 <- 0.07
    e3 <- 0.07
    e4 <- 0.05
    delta_f1 <- delta_f(Ro,Rr,Ir,S1)
    delta_f2 <- delta_f(Ro,Rr,Ir,S2)
    delta_f3 <- delta_f(Ro,Rr,Ir,S3)
    delta_f4 <- delta_f(Ro,Rr,Ir,S4)
    numerator <- (e1*e2)^2*(delta_f4-delta_f3)^2+(e1*e3)^2*(delta_f4-delta_f2)^2+(e1*e4)^2*(delta_f2-delta_f3)^2+(e2*e3)^2*(delta_f4-delta_f1)^2+(e2*e4)^2*(delta_f3-delta_f1)^2+(e3*e4)^2*(delta_f2-delta_f1)^2
    denominator <- (e1*e2*e3)^2+(e1*e2*e4)^2+(e1*e3*e4)^2+(e2*e3*e4)^2
    return(sqrt(numerator/denominator))
    }
  if(chromaty=="tri"){
    e1 <- 0.0425
    e2 <- 0.02
    e3 <- 0.02
    delta_f1 <- delta_f(Ro,Rr,Ir,S1)
    delta_f2 <- delta_f(Ro,Rr,Ir,S2)
    delta_f3 <- delta_f(Ro,Rr,Ir,S3)
    numerator <- (e1^2*(delta_f3-delta_f2)^2+e2^2*(delta_f3-delta_f1)^2+e3^2*(delta_f1-delta_f2)^2)
    denominator <- ((e1*e2)^2+(e1*e3)^2+(e2*e3)^2)
    return(sqrt(numerator/denominator))
    }
  if(chromaty=="di"){
    e1 <- 0.06
    e2 <- 0.02
    delta_f1 <- delta_f(Ro,Rr,Ir,S1)
    delta_f2 <- delta_f(Ro,Rr,Ir,S2)
    numerator <- (delta_f1-delta_f2)^2
    denominator <- (e1^2+e2^2)
    return(sqrt(numerator/denominator))
    }
  print("specification du niveau de chromaty incorrect")
  return(NaN)
  }

reconstBIS<-function (dudi, nf = 1,scoregrid, ...) 
    {
    if (!inherits(dudi, "dudi")) 
        stop("Object of class 'dudi' expected")
    if (nf > dudi$nf) 
        stop(paste(nf, "factors need >", dudi$nf, "factors available\n"))
    if (!inherits(dudi, "pca")) 
        stop("Object of class 'dudi' expected")
    cent <- dudi$cent
    norm <- dudi$norm
    n <- nrow(scoregrid)
    p <- ncol(dudi$tab)
    res <- matrix(0, n, p)
    for (i in 1:nf) {
        xli <- scoregrid[, i]
        yc1 <- dudi$c1[, i]
        res <- res + matrix(xli, n, 1) %*% matrix(yc1, 1, p)
    }
    res <- t(apply(res, 1, function(x) x * norm))
    res <- t(apply(res, 1, function(x) x + cent))
    res <- data.frame(res)
    names(res) <- names(dudi$tab)
    row.names(res) <- paste(scoregrid[,1],scoregrid[,2],scoregrid[,3],scoregrid[,4],scoregri    [,5],sep="/")
    return(res)
    }

#creation of a grid with values to enter in the function delta_S
ran1<-seq(-410,-400,2)
ran2<-seq(110,120,2)
ran3<-seq(10,20,2)
ran4<-seq(-40,-30,2)
ran5<-seq(10,20,2)
Length<-length(ran1)*length(ran2)*length(ran3)*length(ran4)*length(ran5)
print(Length)
gri <- matrix(NA,ncol=6,nrow=Length)
position <- 1
  for(i in 1:length(ran1)) {
     for(j in 1:length(ran2)){
        for(k in 1:length(ran3)){
          for(l in 1:length(ran4)){
            for(m in 1:length(ran5)){
                gri[position,1] <- ran1[i]
                gri[position,2] <- ran2[j]
                gri[position,3] <- ran3[k]开发者_如何转开发
                gri[position,4] <- ran4[l]
                gri[position,5] <- ran5[m]
                position <- position + 1
                if(position %% 100 == 0) print(paste(round(100*position/Length,2),"%"))
              }
            }
          }
        }
      };gc()

Rrecon<-reconstBIS(pca,5,scoregrid=gri)
data_base <- read.table("data_base.txt",header=T,dec=".",sep="\t")

And here's the final loop to avoid:

for (i in 1:nrow(Rrecon)) {
gri[i,6]<- delta_S(Ro=as.vector(Rrecon[i,]),Rr=data_base$bck,Ir=data_base$Ir,S1=data_base$s1,S2=data_base$s2,S3=data_base$s3,S4=data_base$s4,chromaty="tetra")
if(i %% 100 == 0) print(paste(round(100*i/nrow(Rrecon),2),"%"))
}

Thank you very much !

Changing loop to apply don't speed up your code.
You should vectorize delta_S function or use parallel processing (if you have multi-core processor).

There are a few SO posts you might want to read on loops/vectorisation in R.

I agree with Marek - try vectorising your function. Another option is to rewrite the time-consuming parts in C or FORTRAN, then load them as shared objects.

You want more speed...

The suggestions to vectorize your function Delta_S are all well intentioned and would be great if it could be done. I'm not sure it can. Nevertheless, it's a little hard for me to see. It seems to me you need to combine columns of a data frame and rows of a matrix in your final outcome. This is going to be time consuming unless you can solve the row or column problems first. I'll get to that in a minute...

Your creation of your gri only requires you to enter (after the ran variables)

gri <- expand.grid(ran1,ran2,ran3,ran4,ran5)
gri[,6] <- NA
gri <- as.matrix(gri)

That's a lot of lines of code removed right there.

You have several vectors that are essentially constants in your code. They are taken from the database but used as vectors repeatedly (data_base$bck, data_base$Ir, data_base$s1, etc). Each needs to be solved once for that entire for loop. The Qr variable only needs to be solved once for all of Rrecon, not for each line. The denominator only needs to be solved once for all of Rrecon, not for each line... etc. Break the problem down doing all of that first. Then apply to your rows of Rrecon.

While it is sometimes true that apply doesn't save you much over a for loop there are various apply family commands that are very much faster than others. And they almost all, almost always, do save some time. Some of them save loads of time. You'll also be surprised to discover that applying small functions in a vector like syntax (thus implying many small for loops) is faster than applying a large function in a C like syntax.

Oh, and the short answer to getting rid of your final for loop (one of many that should be removed) is...

gri[,6]<- apply (Rrecon, 1, function(x){
    delta_S(Ro=as.vector(x)
    ,Rr=data_base$bck, Ir=data_base$Ir, S1=data_base$s1
    ,S2=data_base$s2, S3=data_base$s3, S4=data_base$s4
    ,chromaty="tetra")
    })

That may not get you a big speedup by itself. It would be much faster if you were just passing the numerator and denominator. But that would require separate apply family loops beforehand to solve each little sub calculations in your code (solve delta_f values, then solve numerator, etc).

You might want to also read the RInferno

You have a very simple calculation performed many, many times. I don't think this is vectorizable (although maybe if you posted the source formula, someone could do it in ten lines, but it's really hard to reverse engineer from your 100).

Two general suggestions, that I think are good habits:

(1) for every iteration of this loop, which occurs 10^6 times, you're storing a whole set of constants (the e_, ran_, Length, and gri). Try calculating those outside the loop.

(2) you also do a lot of a <- 2*b, c <- 3*d, e <- a/c calculations. Replace these three equations with one e <- 2*b / 3*d. You will be surprised how much time it saves when done 10^6.

There's a lot of good stuff in there that I'm sure helped prototyping and readability, but if you're doing 10^6 iterations, you really need to trim it down.

继续阅读：loops optimization r

avoidind a loop in R

Update:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Update:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？