Probability of finding the median with finite space

2023-01-09 22:58 问答作者：

This is a spin off of this StackOverflow question.

Assume that you have a fixed number k of storage locations, and space for two counters. You will receive n items in random order (all permutations of the n items are equally likely). After receiving each item you can either store it in one of the k locations (discarding one of the previously stored values), or discard the item. You can also increment or decrement either of the counters. Any discarded item cannot be retrieved.

The questions are

What is the strategy that maximizes your probability of finding the exact median?
What is that probability?

Obviously, if k > n/2, we can find the median. In general it seems that the same strategy of attempting to keep the number of discarded high values equal to the number of discarded low values should be optimal, but开发者_运维知识库 I am not sure how to prove it, nor how to figure out the probability that it finds the median.

Also of interest is the case where we do not know n but know the probability distribution of n.

Edit: Assume for now that the values are distinct (no duplicates.) However, if you can solve the non distinct case as well, then that would be impressive.

Munro and Paterson studied essentially this problem in their paper Selection and sorting with limited storage. They show that your algorithm requires k = Ω(√n) to succeed with constant probability and that this is asymptotically optimal by appealing to basic results about one-dimensional random walks.

If I wanted to prove absolute optimality, the first thing I would try would be to consider an arbitrary algorithm A and then couple its execution with an algorithm A' that, the first time A deviates from your algorithm, does your algorithm would do instead and then attempts to follow A as closely as it can.

A wild guess: discard the element that is farthest from the mean of the currently stored values.

Comparing to the current median doesn't work if the distribution of values is multi-modal and we get values from a non-dominant mode first.

继续阅读：algorithm probability

Probability of finding the median with finite space

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？