Does IBM General Parallel File System(GPFS) support Map/Reduce jobs?
I am studying various distributed file systems.
Does IBM General Parallel File开发者_如何学C System(GPFS) support Map/Reduce jobs on its own? Without using 3rd party software(like Hadoop Map/reduce)?
Thanks!
In 2009, GPFS was extended to work seamlessly with Hadoop as GPFS-Shared Nothing Cluster architecture, which is now available under the name of GPFS File Placement Optimizer (FPO). FPO allows complete control over the data placements for all replicas, if applications so desires. Of course, you can easily configure to match HDFS allocation.
Check out details at http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.v3r5.gpfs200.doc%2Fbl1adv_fposettings.htm
GPFS has been developed years nearly decades before Map/Reduce as invented as distributed computing paradigma. GPFS by itself has not Map/Reduce capability. As is mainly aimed at HPC and the storage nodes are distinct from the compute nodes.
Therefore Map/Reduce can be done with 3rd party software (mounting GPFS on all Hadoop nodes), but it would not be very effective as all data is far away. No data locality can be used. Caches are more or less useless, etc.
精彩评论