Map and Reduce with large datasets = how does it work in practice?

2023-02-23 12:53 问答作者：

i would be thankfull for advice:

http://en.wikipedia.org/wiki/MapReduce states: "...a large server farm can use MapReduce to sort a petabyte of data in only a few hours..." and "...The master node takes the input, partitions it up into smaller sub-problems, and distributes those to worker nodes..."

I completely do NOT understand how this will work in Practice. Given I have a SAN(storage) with 1 Petabyte of Data. How can I distrubute that amout of data efficiently through the "Master" to the slaves? Thats something I can not understand. Given I have a 10开发者_如何学编程Gibt connection from SAN to the Master, and from the Masters to the slave 1 Gbit, I can at maximum "spread" 10Gbit at a time. How can I process Petabytes withing several hours,as I first have to transfer the data to the "reducer/worker nodes"?

Thanks very much! Jens

Actually, on a full-blown Map/Reduce framework, such as Hadoop, the data storage itself is distributed. Hadoop, for example, has the HDFS distributed file storage system that allows for both redudancy and high performance. The filesystem nodes can be used as computing nodes, or they can be dedicated storage nodes, depending on how to framework has been deployed.

Usually, when mentioning computing times in this case, it is assumed that the input data already exists in the distributed storage of the cluster. The master node merely feeds the computing nodes with data ranges to process - not with the data itself.

I believe it's because the master node does the management, not the data transfer.

The data is stored on a distributed file system and brought in from several nodes simultaneously. (There's no reason for the data to go through the master node.)

继续阅读：algorithm mapreduce

Map and Reduce with large datasets = how does it work in practice?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？