scipy linregress function erroneous standard error return?

2022-12-16 03:22 问答作者：

I have a weird situation with scipy.stats.linregress seems to be returning an incorrect standard error:

from scipy import stats
x = [5.05, 6.75, 3.21, 2.66]
y = [1.65, 26.5, -5.93, 7.96]
gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)
>>> gradient
5.3935773611970186
>>> intercept
-16.281127993087829
>>> r_value
0.72443514211849758
>>> r_value**2
0.52480627513624778
>>> std_err
3.629090122开发者_如何学运维2878866

Whereas Excel returns the following:

 slope: 5.394

 intercept: -16.281

 rsq: 0.525

 steyX: 11.696

steyX is excel's standard error function, returning 11.696 versus scipy's 3.63. Anybody know what's going on here? Any alternative way of getting the standard error of a regression in python, without going to Rpy?

I've just been informed by the SciPy user group that the std_err here represents the standard error of the gradient line, not the standard error of the predicted y's, as per Excel. Nevertheless users of this function should be careful, because this was not always the behaviour of this library - it used to output exactly as Excel, and the changeover appears to have occurred in the past few months.

Anyway still looking for an equivalent to STEYX in Python.

You could try the statsmodels package:

In [37]: import statsmodels.api as sm

In [38]: x = [5.05, 6.75, 3.21, 2.66]

In [39]: y = [1.65, 26.5, -5.93, 7.96]

In [40]: X = sm.add_constant(x) # intercept

In [41]: model = sm.OLS(y, X)

In [42]: fit = model.fit()

In [43]: fit.params
Out[43]: array([  5.39357736, -16.28112799])

In [44]: fit.rsquared
Out[44]: 0.52480627513624789

In [45]: np.sqrt(fit.mse_resid)
Out[45]: 11.696414461570097

yes this is true - the standard estimate of the gradient is what linregress returns; the standard estimate of the estimate (Y) is related, though, and you can back-into the SEE by multiplying the standard error of the gradient (SEG) that linregress gives you: SEG = SEE / sqrt( sum of (X - average X)**2 )

Stack Exchange doesn't handle latex but the math is here if you are interested, under the "Analyze Sample Data" heading.

This will give you an equivalent to STEYX using python:

fit = np.polyfit(x,y,deg=1)
n = len(x)
m = fit[0]
c = fit[1]
y_pred = m*x+c
STEYX = (((y-y_pred)**2).sum()/(n-2))**0.5
print(STEYX)

The calculation of "std err on y" in excel is actually standard deviation of values of y.

That's the same for std err on x. The number '2' in the final step is the degree of freedom of example you given.

>>> x = [5.05, 6.75, 3.21, 2.66]
>>> y = [1.65, 26.5, -5.93, 7.96]
>>> def power(a):
        return a*5.3936-16.2811

>>> y_fit = list(map(power,x))
>>> y_fit
[10.956580000000002, 20.125700000000005, 1.032356, -1.934123999999997]
>>> var = [y[i]-y_fit[i] for i in range(len(y))]
>>> def pow2(a):
        return a**2

>>> summa = list(map(pow2,var))
>>> summa
[86.61243129640003, 40.63170048999993, 48.47440107073599, 97.89368972737596]
>>> total = 0
>>> for i in summa:
        total += i
>>> total
273.6122225845119
>>> import math
>>> math.sqrt(total/2)
11.696414463084658

继续阅读：python scipy

scipy linregress function erroneous standard error return?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？