proximity matrix in python
What is the best way to compute the distance/proximity matrix for very large sparse vectors? For example you are given the following design matrix, where each row is 68771 dimensional sparse vector.
开发者_开发知识库designMatrix <5830x68771 sparse matrix of type '' with 1229041 stored elements in Compressed Sparse Row format>
Have you tried the routines in scipy.spatial.distance
?
http://docs.scipy.org/doc/scipy/reference/spatial.distance.html
If this forces you to go to a dense representation, then you may be better off rolling your own, depending on the density of nonzero elements. You could squeeze out the zeros while retaining a map between the new and original indices, calculate the pairwise distances on the remaining nonzero elements and then use the indexing to map things back.
精彩评论