开发者

How to find a best fit distribution function for a list of data?

I am aware of many probabilistic functions builted-in Python, with the random module.

I'd like to know if, given a list of floats, it would be possible to find the distribution equation that best fits the list?

I don't know if numpy does it, but 开发者_StackOverflow中文版this function could be compared (not equal, but similar) with the Excel's "Trend" function.

How would I do that?


Look at numpy.polyfit

numpy.polyfit(x, y, deg, rcond=None, full=False)

Least squares polynomial fit.

Fit a polynomial p(x) = p[0] * x**deg + ... + p[deg] of degree deg to points (x, y). Returns a vector of coefficients p that minimises the squared error.


there's also curve_fit

from scipy.optimize import curve_fit


You may want to try the time series analysis in statsmodels.tsa. Check out the code below:

from statsmodels.tsa.seasonal import seasonal_decompose
decomp = seasonal_decompose(df_train)

trend = decomp.trend
seasonal = decomp.seasonal
residual = decomp.resid

One caveat. I found the seasonal part not to handle heterostascedy well -- this si when your periodic function amplitude grows with time. It keeps the periodic amplitude constant (that is part of seasonal) and then your residual will show a periodic effect.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜