Efficient MapReduce when dealing with streams to queries to the same dataset
I have a massive, static dataset and I've a function to apply to it.
f is in the form reduce(map(f, dataset)), so I would use the MapReduce s开发者_开发知识库keleton. However, I don't want to scatter the data at each request (and ideally I want to take advantage of indexing in order to speedup f). There is a MapReduce implementation that address this general case?
I've taken a look at IterativeMapReduce and maybe it does the job, but seems to address a slightly different case, and the code isn't available yet.
Hadoop's MapReduce (and all the others map-reduce skeleton inspired by Google) doesn't scatter the data all the time.
精彩评论