Hadoop: Iterative MapReduce Performance

2022-12-27 20:54 问答作者：

Is it correct to say that the parallel computation with iterative MapReduce can be justified mainly when the training data size is too large for the non-parallel computation for the sa开发者_开发知识库me logic?

I am aware that the there is overhead for starting MapReduce jobs. This can be critical for overall execution time when a large number of iterations is required.

I can imagine that the sequential computation is faster than the parallel computation with iterative MapReduce as long as the memory allows to hold a data set in many cases.

No parallel processing system makes much sense if a single machine does the job, most of the time. The complexity associated with most parallelization tasks is significant and requires a good reason to make use of it.

Even when it's obvious that a task can't be resolved without parallel processing in acceptable time, parallel execution frameworks come in different flavours: from the more low-level, science-oriented tools like PVM or MPI to high-level, specialized (e.g. map/reduce) frameworks like Hadoop.

Among the parameters you should consider are start time and scalability (how close to linear does the system scale). Hadoop will not be a good choice if you need answers quickly, but might be a good choice if you can fit your process into a map-reduce frame.

You may refer to project HaLoop ( http://code.google.com/p/haloop ) which addresses exactly this problem.

继续阅读：iteration mapreduce

Hadoop: Iterative MapReduce Performance

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？