开发者

Efficient Multiple Linear Regression in C# / .Net

Does anyone know of an efficient way to 开发者_运维问答do multiple linear regression in C#, where the number of simultaneous equations may be in the 1000's (with 3 or 4 different inputs). After reading this article on multiple linear regression I tried implementing it with a matrix equation:

Matrix y = new Matrix(
    new double[,]{{745},
                  {895},
                  {442},
                  {440},
                  {1598}});

Matrix x = new Matrix(
     new double[,]{{1, 36, 66},
                 {1, 37, 68},
                 {1, 47, 64},
                 {1, 32, 53},
                 {1, 1, 101}});

Matrix b = (x.Transpose() * x).Inverse() * x.Transpose() * y;

for (int i = 0; i < b.Rows; i++)
{
  Trace.WriteLine("INFO: " + b[i, 0].ToDouble());
}

However it does not scale well to the scale of 1000's of equations due to the matrix inversion operation. I can call the R language and use that, however I was hoping there would be a pure .Net solution which will scale to these large sets.

Any suggestions?

EDIT #1:

I have settled using R for the time being. By using statconn (downloaded here) I have found it to be both fast & relatively easy to use this method. I.e. here is a small code snippet, it really isn't much code at all to use the R statconn library (note: this is not all the code!).

_StatConn.EvaluateNoReturn(string.Format("output <- lm({0})", equation));
object intercept = _StatConn.Evaluate("coefficients(output)['(Intercept)']");
parameters[0] = (double)intercept;
for (int i = 0; i < xColCount; i++)
{
  object parameter = _StatConn.Evaluate(string.Format("coefficients(output)['x{0}']", i));
  parameters[i + 1] = (double)parameter;
}


For the record, I recently found the ALGLIB library which, whilst not having much documentation, has some very useful functions such as the linear regression which is one of the things I was after.

Sample code (this is old and unverified, just a basic example of how I was using it). I was using the linear regression on time series with 3 entries (called 3min/2min/1min) and then the finishing value (Final).

public void Foo(List<Sample> samples)
{
  int nAttributes = 3; // 3min, 2min, 1min
  int nSamples = samples.Count;
  double[,] tsData = new double[nSamples, nAttributes];
  double[] resultData = new double[nSamples];

  for (int i = 0; i < samples.Count; i++)
  {
    tsData[i, 0] = samples[i].Tminus1min;
    tsData[i, 1] = samples[i].Tminus2min;
    tsData[i, 2] = samples[i].Tminus3min;

    resultData[i] = samples[i].Final;
  }

  double[] weights = null;
  int fitResult = 0;
  alglib.lsfit.lsfitreport rep = new alglib.lsfit.lsfitreport();
  alglib.lsfit.lsfitlinear(resultData, tsData, nSamples, nAttributes, ref fitResult, ref weights, rep);

  Dictionary<string, double> labelsAndWeights = new Dictionary<string, double>();
  labelsAndWeights.Add("1min", weights[0]);
  labelsAndWeights.Add("2min", weights[1]);
  labelsAndWeights.Add("3min", weights[2]);
}


The size of the matrix being inverted does NOT grow with the number of simultaneous equations (samples). x.Transpose() * x is a square matrix where the dimension is the number of independent variables.


To do linear regressions I tend to use Math.Net Numerics.

Math.NET Numerics aims to provide methods and algorithms for numerical computations in science, engineering and every day use. Covered topics include special functions, linear algebra, probability models, random numbers, interpolation, integration, regression, optimization problems and more.

For example, if you want to fit your data to a line using a linear regression, it is as simple as this:

double[] xdata = new double[] { 10, 20, 30 };
double[] ydata = new double[] { 15, 20, 25 };
Tuple"<"double, double">" p = Fit.Line(xdata, ydata);
double a = p.Item1; // == 10; intercept
double b = p.Item2; // == 0.5; slope


Try Meta.Numerics:

Meta.Numerics is a library for advanced scientific computation in the .NET Framework. It can be used from C#, Visual Basic, F#, or any other .NET programming language. The Meta.Numerics library is fully object-oriented and optimized for speed of implementation and execution.

To populate a matrix, see an example of the ColumnVector Constructor (IList<Double>). It can construct a ColumnVector from many ordered collections of reals, including double[] and List.


I can suggest to use FinMath. It is extremely-optimized .net numerical computation library. It uses Intel Math Kernel Library to do complex calculations such as linear regression or matrix inverse, but most classes have very simple approachable interfaces. And of course, it's scalable to a large sets of data. mrnye's example will be look like this:

using FinMath.LeastSquares;
using FinMath.LinearAlgebra;

Vector y = new Vector(new double[]{745,
    895,
    442,
    440,
    1598});

Matrix X = new Matrix(new double[,]{
    {1, 36, 66},
    {1, 37, 68},
    {1, 47, 64},
    {1, 32, 53},
    {1, 1, 101}});

Vector b = OrdinaryLS.FitOLS(X, y);

Console.WriteLine(b);


I recently came across MathNet-Numerics - which is available under MIT license.

It claims to provide faster alternatives for the common (X.Transpose() * X).Inverse() * (X.Transpose() * y) process.

Here is are some optimizations from this article. First one being:

X.TransposeThisAndMultiply(X).Inverse() * X.TransposeThisAndMultiply(y)

Or, you could use Cholesky decomposition:

X.TransposeThisAndMultiply(X).Cholesky().Solve(X.TransposeThisAndMultiply(y))
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜