Online algorithm for calculating absolute deviation

2023-01-19 23:50 问答作者：

I'm trying to calculate the absolute deviation of a vector online, that is, as each item in the vector is received, without using the entire vector. The absolute deviation is the sum of the absolute difference between each item in a vector and the mean:

I know that the variance of a vector can be calculated in such a manner. Variance is similar to absolute deviation, but each difference is squared:

The online algorithm for variance is as follows:

n = 0
mean = 0
M2 = 0

def calculate_online_variance(x):
    n = n + 1
    delta = x - mean
    mean = mean + delta/n
    M2 = M2 + delta*(x - mean)  # This expression uses the new value of mean
    variance_n = M2/n
    return variance_n

Is there such an algorithm for calculating absolute deviance? I cannot formulate a recursive d开发者_如何学Goefinition myself, but wiser heads may prevail!

As the absolute deviation between x and the mean can be defined as the square root of the squared difference, the adaptation is trivial if you are happy with a consistent but biased estimate (meaning the limit to infinity is the expected value) :

n = 0
mean = 0
M2 = 0

def calculate_online_avg_abs_dev(x):
    n = n + 1
    delta = x - mean
    mean = mean + delta/n
    M2 = M2 + sqrt(delta*(x - mean))  
    avg_abs_dev_n = M2/n

This is for the case of the average absolute deviation. Normally the mad is used (median absolute deviation), which is impossible to program recursively. but the average absolute deviation is as useful in most cases. When we're talking about hundreds of values from close-to-normal distributions, both values are very close.

If you just want the sum of the absolute devations, life is even simpler: just return M2.

Be aware of the fact that BOTH the algorithm you gave and the trivial adaptation for the absolute deviation are slightly biased.

A simulation in R to prove the algorithm works this way :

Online algorithm for calculating absolute deviation

The red line is the true value, the black line is the progressive value following the algorithm outlined above.

Code :

calculate_online_abs_dev <- function(x,n){
  M2=0
  mean=0
  out <- numeric(n)
  for(i in 1:n) {
      delta <- x[i] - mean
      mean <- mean + delta/i
      M2 = M2 + sqrt(delta*(x[i] - mean))
      out[i] <- M2/i

  }
  return(out)
}

set.seed(2010)
x <- rnorm(100)

Abs_Dev <- calculate_online_abs_dev(x,length(x))
True_Val <- sapply(1:length(x),function(i)sum(abs(x[1:i]-mean(x[1:i])))/i)

plot(1:length(x),Abs_Dev,type="l",xlab="number of values",lwd=2)
lines(1:length(x),True_Val,col="red",lty=2,lwd=2)
legend("bottomright",lty=c(1,2),col=c("black","red"),
  legend=c("Online Calc","True Value"))

I don't think it's possible.

In the formula for variance it is possible to separate the x and x² terms, so that it suffices to keep track of those sums (and n). In the formula for absolute deviation this is not possible.

I think the best one can do (apart from keeping the whole vector and calculating the absolute deviation on demand) is keep a sorted list of elements. This is O(log(n)) for each new element, but after you add an element the cost of recalculating the absolute deviation is O(log(n)). This may or may not be worthwhile, depending on your application.

The formula for variance that you give is ONE of the many that are possible (I can think of three distinct ways to do that computation) although I have not verified that yours is correct. It looks reasonably close to what I recall.

The problem is that absolute value is actually more "nonlinear" in a sense than is the sum of squares of the deviations. This prevents you from being able to do that calculation in a recursive form in a loop, at least not without retaining all of the previous values of x. You must compute the overall mean in advance for that sum.

Edit: I see that beta agrees with me. IF you saved all of the previous data points, in a sorted list, you could then efficiently compute the updated desired deviation. But this is counter to the spirit of your request.

继续阅读：algorithm language-agnostic math statistics

Online algorithm for calculating absolute deviation

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生 新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？