computing z-scores for 2D matrices in scipy/numpy in Python

2023-01-02 06:29 问答作者：

How can I compute the z-score for matrices in Python?

Suppose I have the array:

a = array([[   1,    2,    3],
           [  30,   35,   36],
           [2000, 6000, 8000]])

and I want to compute the z-score for each row. The solution I came up with is:

array([zs(item) for item in a])

where zs is in scipy.stats.stats. Is there a better built-in vectorized way to 开发者_运维问答do this?

Also, is it always good to z-score numbers before using hierarchical clustering with euclidean or seuclidean distance? Can anyone discuss the relative advantages/disadvantages?

thanks.

scipy.stats.stats.zs is defined like this:

def zs(a):
    mu = mean(a,None)
    sigma = samplestd(a)
    return (array(a)-mu)/sigma

So to extend it to work on a given axis of an ndarray, you could do this:

import numpy as np
import scipy.stats.stats as sss
def my_zs(a,axis=-1):
    b=np.array(a).swapaxes(axis,-1)    
    mu = np.mean(b,axis=-1)[...,np.newaxis]
    sigma = sss.samplestd(b,axis=-1)[...,np.newaxis]
    return (b-mu)/sigma


a = np.array([[   1,    2,    3],
           [  30,   35,   36],
           [2000, 6000, 8000]])    
result=np.array([sss.zs(item) for item in a])

my_result=my_zs(a)
print(my_result)
# [[-1.22474487  0.          1.22474487]
#  [-1.3970014   0.50800051  0.88900089]
#  [-1.33630621  0.26726124  1.06904497]]
assert(np.allclose(result,my_result))

the new zscore of scipy, available in the next release takes arbitrary array dimension

http://projects.scipy.org/scipy/changeset/6169

继续阅读：cluster-analysis machine-learning numpy python scipy

computing z-scores for 2D matrices in scipy/numpy in Python

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？