Python: Equivalent to Matlab's svds(A, k) for large arrays?

2023-02-10 05:52 问答作者：

I'm trying to port some code from Matlab to Python and I've run into a problem. I can't seem to find an equivalent to svds.

I tried using numpy.corrcoef and then numpy.linalg.eig, but numpy.corrcoef doesn't work for large arrays (say 500 x 20000).

Here's the code in matlab, in case it makes any difference:

s = size(data, 2)
mean = sum(data, 2)/s
m_data = ( data - repmat(mean, 1, s) ) / sqrt(s - 1)
[re开发者_开发技巧s_u,res_s] = svds(m_data, s)
eigenvals = diag(res_s).^2
eigenvecs = res_u

So what you are looking for, would be something like this in python and numpy (I took the liberty not to 'directly translate' the matlab-code to python and numpy, instead I refactored it a little bit to 'feel' more pythonic [of'course quite similar refactoring can be applied to the matlab-code as well]):

import numpy as np

def _cas(D):
    """Center at mean and standardize."""
    return (D- D.mean(1)[:, None])/ (D.shape[1]- 1)** .5

def example(D):
    """Eigenvalues and -vectors, based on SVD."""
    u, s, v= np.linalg.svd(D, full_matrices= False);
    return np.diag(s)** 2, u

if __name__ == '__main__':
    data= np.random.rand(5, 20)
    data= _cas(data) # preprocess data according to your requirements
    eigenvals, eigenvecs= example(data)
    print eigenvals
    print eigenvecs

But you have a performance problem with it?

Can you now be more specific of your current performance, and how much you really to expect it should be enhanced? FTIW in my modest computer, a random (500, 20000) matrix will spend executing example(.) some 20 secs.

I can trivially (due to basic linear algebra) reduce the execution time down to a level like 2.5 sec (allmost 10 fold improvement)! Now, if you are looking for much better performance than this, then please elaborate in much more detailed level on the 'nature' of your data!

Where is your data coming from? What is your specific case to utilize the calculated eigenvalues, and -vectors? i.e. what is your main goal to achieve?

If your matrix is sparse, you can use scipy.sparse.linalg.svds in a recent scipy distribution. It wraps the sparse SVD routine in ARPACK. In my experience, it is buggy (not sure if this is the fault of scipy or ARPACK), so I advise running tests to check that the singular values are as expected.

If your matrix is dense, 500 by 20000 is big but not intractable on a commodity desktop these days. You are calling numpy.linalg.svd with full_matrices=False, right?

What are you using numpy.corrcoef for? Nothing like that appears in the MATLAB code snippet that you posted.

继续阅读：python

Python: Equivalent to Matlab's svds(A, k) for large arrays?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？