Map Reduce Frameworks/Infrastructure
Map Reduce is a pattern that seems to get a lot of traction lately and I start to see it manifest in one of my projects that is focused on an event processing pipeline (iPhone Accelerometer and GPS data). I needed to built a lot of infrastructure for this project, in fact it overweighs the logic code interacting with it by 2x. Some of the components I built where EventProcessors (with in- and outputbuffers, timing etc.), EventListeners, Agg开发者_Go百科regators and a staged Pipeline.
This leads me to my question what the "common" required infrastrucutre for map reduce is. Since I am working with .Net a lot I can see map reduce infrastructure built into the Framework and language constructs. Functional languages support this paradigm per se. It seems every language can be used with map reduce. There are even languages built around that concept (e.g. Go).
Apache Hadoop brings Map-Reduce to Java. Google has patented a map-reduce framework. What kind of infrastructure do they provide to enable map reduce? What are the constructs exhibited in functional languages to implement map reduce? What needs/should a map-reduce framework provide?
Well Hadoop is based on the Google File System. The Hadoop MapReduce implementation is also based on a paper by Google. For both Google and Hadoop the component that allows MapReduce to sucessfully run over massive amounts of data in parallel is the distributed file system.
As I understand it, Hadoop is generally based upon the HDFS and/or HBase infrastructure, which acts as the data distribution mechanism for Hadoop itself to operate on.
There's also Amazon Elastic MapReduce, which is a shiny web frontend which uses EC2 and Hadoop to make things easier. The "infrastructure" in this case is EC2 and S3.
P.S. Sorry for the snippy comment :)
Since you are used to working with .NET, you may want to look at DryadLINQ. http://research.microsoft.com/en-us/downloads/03960cab-bb92-4c5c-be23-ce51aee0792c/default.aspx
精彩评论