from for loop to apply
I am new in using R. So I am not sure about how to use apply. I would like to speed up my function with using apply:
for(i in 1: ncol(exp)){
for (j in 1: length(fe)){
tmp =TRUE
id = strsplit(colnames(exp)[i],"\\.")
if(id == fe[j]){
tmp = FALSE
}
if(tmp ==TRUE){
only = cbind(only,c(names(exp)[i],exp[,i]) )
}
}
}
How can I use the apply function to do this above?
EDIT :
Thank you so much for the ve开发者_如何学Cry good explanation and sorry for my bad description. You guess everything right, but When wanted to delete matches in fe.
Exp <- data.frame(A.x=1:10,B.y=10:1,C.z=11:20,A.z=20:11)
fe<-LETTERS[1:2]
then the result should be only colnames with 'C'. Everything else should be deleted.
1 C.z
2 11
3 12
4 13
5 14
6 15
7 16
8 17
9 18
10 19
11 20
EDIT : If you only want to delete the columns whose name appear in fe, you can simply do :
Exp <- data.frame(A.x=1:10,B.y=10:1,C.z=11:20,A.z=20:11)
fe<-LETTERS[1:2]
id <- sapply(strsplit(names(Exp),"\\."),
function(i)!i[1] %in% fe)
Exp[id]
This code does exactly what your (updated) for-loop does as well, only a lot more efficient. You don't have to loop through fe, the %in%
function is vectorized.
In case the name can appear anywhere between the dots, then
id <- sapply(strsplit(names(Exp),"\\."),
function(i)sum(i %in% fe)==0)
Your code does some very funny things, and I have no clue what exactly you're trying to do. For one, strsplit
gives a list, so id == fe[j]
will always return false, unless fe[j]
is a list itself. And I doubt it is... So I'd correct your code as
id = strsplit(colnames(Exp)[i],"\\.")[[1]][1]
in case you want to compare with everything that is before the dot, or to
id = unlist(strsplit(colnames(Exp)[i],"\\."))
if you want to compare with everything in the string. In that case, you should use %in%
instead of ==
as well.
Second, what you get is a character matrix, which essentially multiplies rows. if all elements in fe[j] are unique, you could as well do :
only <- rbind(names(exp),exp)
only <- do.call(cbind,lapply(mat,function(x)
matrix(rep(x,ncol(exp)-1),nrow=nrow(exp)+1)
))
Assuming that the logic in your code does make sense (as you didn't apply some sample data this is impossible to know), the optimalization runs :
mat <- rbind(names(Exp),Exp)
do.call(cbind,
lapply(mat, function(x){
n <- sum(!fe %in% strsplit(x[1],"\\.")[[1]][1])
matrix(rep(x,n),nrow=nrow(mat))
}))
Note that - in case you are interested if fe[j] appears anywhere in the name - you can change the code to :
do.call(cbind,
lapply(mat, function(x){
n <- sum(!fe %in% unlist(strsplit(x[1],"\\.")))
matrix(rep(x,n),nrow=nrow(mat))
}))
If this doesn't return what you want, then your code doesn't do that either. I checked with following sample data, and all gives the same result :
Exp <- data.frame(A.x=1:10,B.y=10:1,C.z=11:20,A.z=20:11)
fe <- LETTERS[1:4]
The apply()
family of functions are convenience functions. They will not necessarily be faster than a well-written for loop or vectorized functions. For example:
set.seed(21)
x <- matrix(rnorm(1e6),5e5,2)
system.time({
yLoop <- x[,1]*0 # preallocate result
for(i in 1:NROW(yLoop)) yLoop[i] <- mean(x[i,])
})
# user system elapsed
# 13.39 0.00 13.39
system.time(yApply <- apply(x, 1, mean))
# user system elapsed
# 16.19 0.28 16.51
system.time(yRowMean <- rowMeans(x))
# user system elapsed
# 0.02 0.00 0.02
identical(yLoop,yApply,yRowMean)
# TRUE
The reason your code is so slow is that--as Gavin pointed out--you're growing your array for every loop iteration. Preallocate the entire array before the loop and you will see a significant speedup.
精彩评论