Hadoop map/reduce chaining
I want to chain 2 Map/Reduc开发者_如何学编程e jobs. I am trying to use JobControl to achieve the same. My problem is -
JobControl needs org.apache.hadoop.mapred.jobcontrol.Job which in turn needs org.apache.hadoop.mapred.JobConf which is deprecated. How do I get around this problem to chain my Map/Reduce?
Anyone has any better ideas for chaining (other than Cascading).
You could use Riffle, it allows you to chain arbitrary processes together (anything you stick its Annotations on).
It has a rudimentary dependency scheduler, so it will order and execute your jobs for you. And it's Apache licensed. Its also on the Conjars repo if you're a maven user.
I'm the author, and wrote it so Mahout and other custom applications would be able to have a common tool that was also compatible with Cascading Flows.
I'm also the author of Cascading. But MapReduceFlow + Cascade in Cascading works quite well for most raw MR job chaining.
Cloudera has a workflow tool called Oozie that can help with this sort of chaining. Might be overkill for just getting one job to run after another.
精彩评论