开发者

Persistence in MapReduce

Let's say you have divided your work for the map phase of map/reduce and mapping is running. Now, each unit of work takes about 1 minute. Let's say that you need to stop processing. How would you persist the state of the map/reduce so that you waste the least 开发者_运维技巧amount of time when you start back up?


You'd have to memoize the results in a way that allows you to skip most of the processing of rows you've seen before. If there's a candidate key that identifies the row you can use that to look in a cache, then fetch the processed results that are stored there.

Setting up your cluster with Memcached or Redis would be one approach for achieving memoization.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜