R: Applying a function to all row-pairs of a matrix without for loop
I want all pairwise comparisons for all rows in the matrix, obviou开发者_如何学编程sly double for loop will work but extremely expensive for large dataset.
I looked up implicit loop like apply()
, etc. but have no a clue how to avoid the inner loop.
How can it be achieved?
I'm assuming you're trying do some type of comparison across all row-pairs of a matrix.
You could use outer()
to run through all pairs of row-indices, and apply a vectorized
comparison function to each row-pair. E.g. you could calculate the squared Euclidean distance among all row-pairs as follows:
m <- matrix(1:12,4,3)
> outer(1:4,1:4, FUN = Vectorize( function(i,j) sum((m[i,]-m[j,])^2 )) )
[,1] [,2] [,3] [,4]
[1,] 0 3 12 27
[2,] 3 0 3 12
[3,] 12 3 0 3
[4,] 27 12 3 0
outer()
works fine if you are willing to do self-compare - such as 1-1 and 2-2 etc... (the diagonal values in the matrix). Also outer() performs both 1-2 and 2-1 comparisions.
Most of the times pair-wise comparisions only require triangular comparisions, without the self-comparision and mirror comparisions. To achieve triangular comparisions, use combn()
method.
Here is a sample output to show the difference between outer()
and combn()
> v <- c(1,2,3,4)
> outer(v, v, function(x, y) print(paste(x, "-", y)))
[1] "1 - 1" "2 - 1" "3 - 1" "4 - 1" "1 - 2" "2 - 2" "3 - 2" "4 - 2" "1 - 3" "2 - 3" "3 - 3" "4 - 3" "1 - 4" "2 - 4" "3 - 4" "4 - 4"
Note the "1-1" self-comparisions above. And the "1-2" and "2-1" mirror comparisions. Contrast it with the below:
> v <- c(1,2,3,4)
> allPairs <- combn(length(v), 2) # choose a pair from 1:length(v)
> a_ply(combn(length(v), 2), 2, function(x) print(paste(x[1],"--",x[2]))) # iterate over all pairs
[1] "1 -- 2"
[1] "1 -- 3"
[1] "1 -- 4"
[1] "2 -- 3"
[1] "2 -- 4"
[1] "3 -- 4"
You can see the "upper triangular" part of the matrix in the above.
Outer() is more apt when you have two different vectors to do pair-wise operation. For performing pair-wise operations within a single vector, more often than not you can get away with combn.
For example, if you are doing outer(x,x,...)
then you are perhaps doing it wrong - you should consider combn(length(x),2))
Maybe not so universal solution as @Prasad but much faster in this special case of sum of squares:
dist(m)^2
@Gopalkrishna Palem
I like your solution! However, I think you should use combn(v, 2) instead of combn(length(v), 2). combn(length(v), 2) only iterates over the indecies of v
> v <- c(3,4,6,7)
> combn(v, 2)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 3 3 4 4 6
[2,] 4 6 7 6 7 7
> combn(length(v), 2)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 1 2 2 3
[2,] 2 3 4 3 4 4
> a_ply(combn(v, 2), 2, function(x) print(paste(x[1],"--",x[2])) )
[1] "3 -- 4"
[1] "3 -- 6"
[1] "3 -- 7"
[1] "4 -- 6"
[1] "4 -- 7"
[1] "6 -- 7"
> a_ply(combn(length(v), 2), 2, function(x) print(paste(x[1],"--",x[2])) )
[1] "1 -- 2"
[1] "1 -- 3"
[1] "1 -- 4"
[1] "2 -- 3"
[1] "2 -- 4"
[1] "3 -- 4"
so the final result is correct with combn(v, 2).
Then if we have a dataframe, we can use the indices to apply a function to pairwise rows:
> df
x y
1 4 8
2 5 9
3 6 10
4 7 11
a_ply(combn(nrow(df), 2), 2, function(x) print(df[x[1],] - df[x[2],]))
x y
1 -1 -1
x y
1 -2 -2
x y
1 -3 -3
x y
2 -1 -1
x y
2 -2 -2
x y
3 -1 -1
However, a_ply will discard the result, so how can I store the output in a vector for further analysis? I don't want to just print the result
精彩评论