开发者

Probability of finding the median with finite space

This is a spin off of this StackOverflow question.

Assume that you have a fixed number k of storage locations, and space for two counters. You will receive n items in random order (all permutations of the n items are equally likely). After receiving each item you can either store it in one of the k locations (discarding one of the previously stored values), or discard the item. You can also increment or decrement either of the counters. Any discarded item cannot be retrieved.

The questions are

  1. What is the strategy that maximizes your probability of finding the exact median?
  2. What is that probability?

Obviously, if k > n/2, we can find the median. In general it seems that the same strategy of attempting to keep the number of discarded high values equal to the number of discarded low values should be optimal, but开发者_运维知识库 I am not sure how to prove it, nor how to figure out the probability that it finds the median.

Also of interest is the case where we do not know n but know the probability distribution of n.

Edit: Assume for now that the values are distinct (no duplicates.) However, if you can solve the non distinct case as well, then that would be impressive.


Munro and Paterson studied essentially this problem in their paper Selection and sorting with limited storage. They show that your algorithm requires k = Ω(√n) to succeed with constant probability and that this is asymptotically optimal by appealing to basic results about one-dimensional random walks.

If I wanted to prove absolute optimality, the first thing I would try would be to consider an arbitrary algorithm A and then couple its execution with an algorithm A' that, the first time A deviates from your algorithm, does your algorithm would do instead and then attempts to follow A as closely as it can.


A wild guess: discard the element that is farthest from the mean of the currently stored values.

Comparing to the current median doesn't work if the distribution of values is multi-modal and we get values from a non-dominant mode first.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜