R dichotomize sparse matrix
I have a large 500x53380 sparse matrix and trying to dichotomize it. I have tried using "event2dichot" under sna package but no success because it requires an adjacency matrix or network object.
I also tried writing a simple algorith like开发者_开发知识库
for ( i in 1:500)
for (j in 1:53380)
if (matrix[i,j]>0) matrix[i,j]=1
this seems to be working but since the matrix is very large, it takes hours at least a few hours so far and it is still computing as I am asking this question for help!
Do u know a better method or hack to accomplish this task?
thanks all.
Although your question is about sparse matrices, it seems to me your code actually describes a standard matrix.
If this is the case, you can process a 500x53380 matrix in seconds. The following code makes use of the fact that a matrix is internally stored in R as a vector. This means you can apply a single vector function over the entire matrix. The caveat is that you have to restore the matrix dimensions afterwards.
Here is an illustration with a much smaller matrix:
mr <- 5
mc <- 8
mat <- matrix(round(rnorm(mr*mc), 3), nrow=mr)
mat
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] -1.477 1.773 1.630 -0.152 1.054 0.057 -1.260 0.999
[2,] -1.863 -0.312 -0.221 -0.102 0.892 -1.255 0.996 -0.193
[3,] -0.364 -0.059 2.317 1.156 0.893 0.225 0.392 -1.986
[4,] -1.123 -0.661 0.070 0.032 0.019 -1.763 -0.205 0.951
[5,] -0.111 -3.112 -0.970 -0.794 -1.372 -0.119 1.291 -0.680
mydim <- dim(mat)
mat[mat>0] <- 1
mat[mat<0] <- 0
dim(mat) <- mydim
mat
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0 1 1 0 1 1 0 1
[2,] 0 0 0 0 1 0 1 0
[3,] 0 0 1 1 1 1 1 0
[4,] 0 0 1 1 1 0 0 1
[5,] 0 0 0 0 0 0 1 0
Repeating this entire process for a 500x53380 matrix takes ~12 seconds on my machine:
mr <- 500
mc <- 53380
system.time({
mat <- matrix(round(rnorm(mr*mc), 3), nrow=mr)
mydim <- dim(mat)
mat[mat>0] <- 1
mat[mat<0] <- 0
dim(mat) <- mydim
})
user system elapsed
12.25 0.42 12.88
Think vectorised, and use just the indices. E.g.:
mat <- matrix(0, nrow = 500, ncol = 53380)
set.seed(7)
fill <- sample(500*53380, 10000)
mat[fill] <- sample(fill, 1:10, replace = TRUE)
one can discretize using:
mat[mat > 0] <- 1
Which is pretty quick on my workstation:
> system.time(mat[mat > 0] <- 1)
user system elapsed
1.680 0.166 1.875
If you use the Matrix package, and the matrix is - say, Mat, then you can operate on Mat@x
as a vector.
E.g. ix_low <- (Mat@x < threshold), then Mat@x[ix_low] = 0, Mat@x[!ix_low] = 1
.
The key is that you're thinking in the wrong way when looking at sparse matrices. A typical representation is (i,j,value).
You're only looking at touching the value vector - don't iterate over anything else.
A simple way to do with with a formally defined sparse matrix (i.e. a matrix generated in base 'Matrix
' with a capital M instead of the older base 'matrix
') is to coerce the matrix to a logical using the as
command, then back to a numeric or integer matrix.
精彩评论