开发者

How do I improve this linear regression function?

I have the following PHP function that I'm using to draw a trend line. 开发者_开发技巧However, it sometimes plots the line below all the points in the scatter graph. Is there an error in my function or is there a better way to do it. I think it might be something to do with that with the line it produces, it treats all the residuals (the distances from the scatter points to the line) as positive regardless of them being above or below the line.

function linear_regression($x, $y) {

$n = count($x);

$x_sum = array_sum($x); $y_sum = array_sum($y);

$xx_sum = 0; $xy_sum = 0;

for($i = 0; $i < $n; $i++) { $xy_sum+=($x[$i]*$y[$i]); $xx_sum+=($x[$i]*$x[$i]); }

$m = (($n * $xy_sum) - ($x_sum * $y_sum)) / (($n * $xx_sum) - ($x_sum * $x_sum)); $b = ($y_sum - ($m * $x_sum)) / $n; return array("m"=>$m, "b"=>$b);

}


This is a good function created by Richard@Home

/**
 * linear regression function
 * @param $x array x-coords
 * @param $y array y-coords
 * @returns array() m=>slope, b=>intercept
 */
function linear_regression($x, $y) {

  // calculate number points
  $n = count($x);

  // ensure both arrays of points are the same size
  if ($n != count($y)) {
    trigger_error("linear_regression(): Number of elements in coordinate arrays do not match.", E_USER_ERROR);
  }

  // calculate sums
  $x_sum = array_sum($x);
  $y_sum = array_sum($y);

  $xx_sum = 0;
  $xy_sum = 0;

  for($i = 0; $i < $n; $i++) {
    $xy_sum+=($x[$i]*$y[$i]);
    $xx_sum+=($x[$i]*$x[$i]);
  }

  // calculate slope
  $m = (($n * $xy_sum) - ($x_sum * $y_sum)) / (($n * $xx_sum) - ($x_sum * $x_sum));

  // calculate intercept
  $b = ($y_sum - ($m * $x_sum)) / $n;

  // return result
  return array("m"=>$m, "b"=>$b);
}

Example Usage:

var_dump( linear_regression(array(1, 2, 3, 4), array(1.5, 1.6, 2.1, 3.0)) );

https://richardathome.wordpress.com/2006/01/25/a-php-linear-regression-function/


I don't see a direct problem with your function, but if it is producing wrong-looking results only sometimes, perhaps you have overflow problems. The formula you use is not computationally robust. The simple linear regression article on Wikipedia does give a different formula (right in front of the formula you use) - that one is less likely to have overflows.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜