How to find a best fit distribution function for a list of data?
I am aware of many probabilistic functions builted-in Python, with the random
module.
I'd like to know if, given a list of floats, it would be possible to find the distribution equation that best fits the list?
I don't know if numpy does it, but 开发者_StackOverflow中文版this function could be compared (not equal, but similar) with the Excel's "Trend" function.
How would I do that?
Look at numpy.polyfit
numpy.polyfit(x, y, deg, rcond=None, full=False)
Least squares polynomial fit.
Fit a polynomial p(x) = p[0] * x**deg + ... + p[deg] of degree deg to points (x, y). Returns a vector of coefficients p that minimises the squared error.
there's also curve_fit
from scipy.optimize import curve_fit
You may want to try the time series analysis in statsmodels.tsa. Check out the code below:
from statsmodels.tsa.seasonal import seasonal_decompose
decomp = seasonal_decompose(df_train)
trend = decomp.trend
seasonal = decomp.seasonal
residual = decomp.resid
One caveat. I found the seasonal part not to handle heterostascedy well -- this si when your periodic function amplitude grows with time. It keeps the periodic amplitude constant (that is part of seasonal) and then your residual will show a periodic effect.
精彩评论