Mutiple maps followed by one reduce with Hadoop and HBase
I have several Hbase开发者_如何学运维 tables. I wish to run a map task on each table (each map being a different Mapper
class since each table contains heterogeneous data) followed by one reduce.
I cannot work out if this is possible without explictly reducing the data after each map into an interim SequenceFile
.
Any help would be gratefully received.
It seems you can only run an MR on one table at a time (see TableMapReduceUtil). So most probably, your best bet is as you suspected: save the output of each table into an interim location (e.g. SequenceFile or a tmp hbase table) and then write a final MR job that takes that location as an input and merges the results. Also, if each MR job outputs data in a common format, you may not even need the last MR merge job.
精彩评论