开发者

Update the quantile for a dataset when a new datapoint is added

Suppose I have a list of numbers and I've computed the q-quantile (using Quantile). Now a new datapoint comes along and I want to update my q-quantile, without having stored th开发者_运维技巧e whole list of previous datapoints. What would you recommend?

Perhaps it can't be done exactly without, in the worst case, storing all previous datapoints. In that case, can you think of something that would work well enough?


One idea I had, if you can assume normality, is to use the inverse CDF instead of the q-quantile. Keep track of the sample variance as you go and then you can compute InverseCDF[NormalDistribution[sampleMean,sampleVariance], q] which should be the value such that a fraction q of the values are smaller, which is what the q-quantile is.

(I see belisarius was thinking along the same lines. Here's the link he pointed to: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_algorithm )


Unless you know that your underlying data comes from some distribution, it is not possible to update arbitrary quantiles without retaining the original data. You can, as others suggested, assume that the data has some sort of distribution and store the quantiles this way, but this is a rather restrictive approach.

Alternately, have you thought of programming this somewhere besides Mathematica? For example, you could create a class for your datapoints that contains (1) the Double value and (2) some timestamp for when the data came in. In a SortedList of these datapoints classes (which compares based on value), you could get the quantile very fast by simply referencing the index of the datapoints. Want to get a historical quantile? Simply filter on the timestamps in your sorted list.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜