R: Fast multiply selected rows in data.frame (or other data structure)
I have an object of type data.frame like this, but much bigger:
> head(mydf)
id1 id2 n
1 0 1032142 3
2 0 1072163 1
3 0 119323 2
I need to print to a file columns a1
and a1
, each of them n
times. So that I could get a file like that:
0 1032142
0 1032142
0 1032142
0 1072163
0 119323
0 119323
I tried the following solutions, but they make use of explicit for
loops and are incredibly slow (it take few days to finish them with my data...):
for (j in 1:(nrow(mydf))) for (i in 1:(mydf[j,"n"])) write.table( mydf[j,c("id1","id2")开发者_如何学C], file="trials", append=T, row.names= F, col.names=F )
The other tries to build a new data.frame with multiplied rows, but it is even slower to run.
towrite=data.frame(); for (j in 1:(nrow(mydf))) for (i in 1:(mydf[j,"n"])) towrite=rbind(towrite,mydf[j,c("id1","id2")])
What is the simplest and fastest way of resolving this under R?
Try subsetting your data and save in one batch:
mydf[rep(1:nrow(mydf), mydf$n), ]
If your data is numeric, then manipulating the matrix is much faster:
mymat <- as.matrix(mydf)
reps <- as.integer(mydf$n)
mymat[rep(1:nrow(mymat), reps), ]
id1 id2 n
1 0 1032142 3
1 0 1032142 3
1 0 1032142 3
2 0 1072163 1
3 0 119323 2
3 0 119323 2
If you managed to manipulate your original data.frame, then you will probably be able to handle the above matrix.
If you only want to write every row n times to a file, then try:
Loading demo data:
data <- structure(list(id1 = c(0L, 0L, 0L),
id2 = c(1032142L, 1072163L, 119323L),
n = c(3L, 1L, 2L)), .Names = c("id1", "id2", "n"), class = "data.frame", row.names = c(NA, -3L))
And writing all rows n times to "output.txt":
file = 'output.txt'
write.table(data[0,], file=file, row.names=FALSE)
apply(data, 1, function(x) replicate(x[3], write.table(t(x[1:2]), file=file, append=TRUE, col.names=FALSE, row.names=FALSE)))
I am sure this could be written a lot nicer :)
Maybe you can try apply and sink. I am not sure if apply is actually faster than for-loops though (tapply and lapply definatly are).
mydat=data.frame(id1=0,id2=rnorm(5),n=sample(1:10,5))
mydat
sink("test.txt")
apply(mydat,1,function(x)cat(paste(rep(paste(x[1:2],collapse="\t"),x[3]),"\n" )))
sink()
I know the code looks horrible
精彩评论