开发者

numpy: code to update least squares with more observations

I am looking for a numpy-based implementation of ordinary least squares that would allow the fit to be updated with more observations. Something along the lines of Applied Statistics alg开发者_运维问答orithm AS 274 or R's biglm.

Failing that, a routine for updating a QR decomposition with new rows would also be of interest.

Any pointers?


scikits.statsmodels has an recursive OLS that updates the inverse X'X in the sandbox that could be used for this. (used only to calculate recursive OLS residuals.)

Nathaniel Smith posted his code for OLS when the data is too large to fit in memory to the scipy-user mailing list. The main code updates X'X.

I think econpy also has a function for this.

Pandas has an expanding OLS, but it may not be easy to use in an online fashion.

Nathaniels code might be the closest to biglm. I don't think there is anything for general linear model (error covariance different from identity).

All need some work before they can be used for this. I don't know of any python(-wrapped) code that would update QR.

update: see http://mail.scipy.org/pipermail/scipy-dev/2010-February/013853.html

there is incremental qr and cholesky in cholmod available, but I didn't try it, either license or compilation on windows problems, and I don't think I tried to get incremental_qr to work see attachements

http://mail.scipy.org/pipermail/scipy-dev/2010-February/013844.html


You might try the pythonequations project at http://code.google.com/p/pythonequations/downloads/list, though it may be more than you need it does use scipy and numpy. That code is the middleware for the http://zunzun.com online curve and surface fitting web site (I'm the author). The source code comes with many examples. Alternatively, the web site alone may be sufficient - please give it a try.

 James Phillips
 2548 Vera Cruz Drive
 Birmingham, AL  35235  USA

 zunzun@zunzun.com


This is not a detailed answer yet, but:

AFAIK, the QR update like this is not implemented in numpy, but anyway I'll like ask you to specify a more detailed manner what you are actually aiming for.

Especially, why it would not be acceptable to just calculate new estimate for x (of Ax= b) with k latest observations, when (bunch of) new observations arrives (and with modern hardware, k indeed can be quite large one)?


The LSQ.F90 part of the file compiles easily enough with,

gfortran-4.4 -shared -fPIC -g -o lsq.so LSQ.F90

and this works in Python,

from ctypes import cdll

lsq = cdll.LoadLibrary('./lsq.so')

As soon as I figure out the function call I'll include it in this answer.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜