numpy: code to update least squares with more observations
I am looking for a numpy
-based implementation of ordinary least squares that would allow the fit to be updated with more observations. Something along the lines of Applied Statistics alg开发者_运维问答orithm AS 274 or R's biglm
.
Failing that, a routine for updating a QR decomposition with new rows would also be of interest.
Any pointers?
scikits.statsmodels has an recursive OLS that updates the inverse X'X in the sandbox that could be used for this. (used only to calculate recursive OLS residuals.)
Nathaniel Smith posted his code for OLS when the data is too large to fit in memory to the scipy-user mailing list. The main code updates X'X.
I think econpy also has a function for this.
Pandas has an expanding OLS, but it may not be easy to use in an online fashion.
Nathaniels code might be the closest to biglm. I don't think there is anything for general linear model (error covariance different from identity).
All need some work before they can be used for this. I don't know of any python(-wrapped) code that would update QR.
update: see http://mail.scipy.org/pipermail/scipy-dev/2010-February/013853.html
there is incremental qr and cholesky in cholmod available, but I didn't try it, either license or compilation on windows problems, and I don't think I tried to get incremental_qr to work see attachements
http://mail.scipy.org/pipermail/scipy-dev/2010-February/013844.html
You might try the pythonequations project at http://code.google.com/p/pythonequations/downloads/list, though it may be more than you need it does use scipy and numpy. That code is the middleware for the http://zunzun.com online curve and surface fitting web site (I'm the author). The source code comes with many examples. Alternatively, the web site alone may be sufficient - please give it a try.
James Phillips
2548 Vera Cruz Drive
Birmingham, AL 35235 USA
zunzun@zunzun.com
This is not a detailed answer yet, but:
AFAIK, the QR
update like this is not implemented in numpy
, but anyway I'll like ask you to specify a more detailed manner what you are actually aiming for.
Especially, why it would not be acceptable to just calculate new estimate for x
(of Ax= b
) with k
latest observations, when (bunch of) new observations arrives (and with modern hardware, k
indeed can be quite large one)?
The LSQ.F90
part of the file compiles easily enough with,
gfortran-4.4 -shared -fPIC -g -o lsq.so LSQ.F90
and this works in Python,
from ctypes import cdll
lsq = cdll.LoadLibrary('./lsq.so')
As soon as I figure out the function call I'll include it in this answer.
精彩评论