开发者

Parallel computation of the median of a large array

I got asked this question once and still haven't been able to figure it out:

You have an array o开发者_运维技巧f N integers, where N is large, say, a billion. You want to calculate the median value of this array. Assume you have m+1 machines (m workers, one master) to distribute the job to. How would you go about doing this?

Since the median is a nonlinear operator, you can't just find the median in each machine and then take the median of those values.


Depending on the Parallel Computation Model, algorithms could vary. (Note: the pdf linked to in previous sentence just contains some of the many possible ones).

Finding the median is a special case of finding the ith element. This problem is called 'selection problem', so you need to search the web for parallel selection.

Here is one paper (unfortunately, not free) which might be useful: Parallel Selection Algorithms With Analysis on Clusters.

And google's first link for the query "Parallel Selection" gives: http://www.umiacs.umd.edu/research/EXPAR/papers/3494/node18.html which actually uses the median of medians for the general problem and not just median finding.


You could do a highly parallelizable sort (like merge sort) and get the median from the result.


Would sorting the array be overkill? If not, then divide up the array and then merge the results together is my suggestion.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜