开发者

R dichotomize sparse matrix

I have a large 500x53380 sparse matrix and trying to dichotomize it. I have tried using "event2dichot" under sna package but no success because it requires an adjacency matrix or network object.

I also tried writing a simple algorith like开发者_开发知识库

for ( i in 1:500)
for (j in 1:53380)
if (matrix[i,j]>0) matrix[i,j]=1

this seems to be working but since the matrix is very large, it takes hours at least a few hours so far and it is still computing as I am asking this question for help!

Do u know a better method or hack to accomplish this task?

thanks all.


Although your question is about sparse matrices, it seems to me your code actually describes a standard matrix.

If this is the case, you can process a 500x53380 matrix in seconds. The following code makes use of the fact that a matrix is internally stored in R as a vector. This means you can apply a single vector function over the entire matrix. The caveat is that you have to restore the matrix dimensions afterwards.

Here is an illustration with a much smaller matrix:

mr <- 5
mc <- 8

mat <- matrix(round(rnorm(mr*mc), 3), nrow=mr)
mat

       [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]
[1,] -1.477  1.773  1.630 -0.152  1.054  0.057 -1.260  0.999
[2,] -1.863 -0.312 -0.221 -0.102  0.892 -1.255  0.996 -0.193
[3,] -0.364 -0.059  2.317  1.156  0.893  0.225  0.392 -1.986
[4,] -1.123 -0.661  0.070  0.032  0.019 -1.763 -0.205  0.951
[5,] -0.111 -3.112 -0.970 -0.794 -1.372 -0.119  1.291 -0.680

mydim <- dim(mat)
mat[mat>0] <- 1
mat[mat<0] <- 0
dim(mat) <- mydim
mat

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    0    1    1    0    1    1    0    1
[2,]    0    0    0    0    1    0    1    0
[3,]    0    0    1    1    1    1    1    0
[4,]    0    0    1    1    1    0    0    1
[5,]    0    0    0    0    0    0    1    0

Repeating this entire process for a 500x53380 matrix takes ~12 seconds on my machine:

mr <- 500
mc <- 53380

system.time({
  mat <- matrix(round(rnorm(mr*mc), 3), nrow=mr)
  mydim <- dim(mat)
  mat[mat>0] <- 1
  mat[mat<0] <- 0
  dim(mat) <- mydim
})

   user  system elapsed 
  12.25    0.42   12.88 


Think vectorised, and use just the indices. E.g.:

mat <- matrix(0, nrow = 500, ncol = 53380)
set.seed(7)
fill <- sample(500*53380, 10000)
mat[fill] <- sample(fill, 1:10, replace = TRUE)

one can discretize using:

mat[mat > 0] <- 1

Which is pretty quick on my workstation:

> system.time(mat[mat > 0] <- 1)
   user  system elapsed 
  1.680   0.166   1.875


If you use the Matrix package, and the matrix is - say, Mat, then you can operate on Mat@x as a vector. E.g. ix_low <- (Mat@x < threshold), then Mat@x[ix_low] = 0, Mat@x[!ix_low] = 1.

The key is that you're thinking in the wrong way when looking at sparse matrices. A typical representation is (i,j,value).

You're only looking at touching the value vector - don't iterate over anything else.


A simple way to do with with a formally defined sparse matrix (i.e. a matrix generated in base 'Matrix' with a capital M instead of the older base 'matrix') is to coerce the matrix to a logical using the as command, then back to a numeric or integer matrix.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜