开发者

Hadoop map/reduce chaining

I want to chain 2 Map/Reduc开发者_如何学编程e jobs. I am trying to use JobControl to achieve the same. My problem is -

JobControl needs org.apache.hadoop.mapred.jobcontrol.Job which in turn needs org.apache.hadoop.mapred.JobConf which is deprecated. How do I get around this problem to chain my Map/Reduce?

Anyone has any better ideas for chaining (other than Cascading).


You could use Riffle, it allows you to chain arbitrary processes together (anything you stick its Annotations on).

It has a rudimentary dependency scheduler, so it will order and execute your jobs for you. And it's Apache licensed. Its also on the Conjars repo if you're a maven user.

I'm the author, and wrote it so Mahout and other custom applications would be able to have a common tool that was also compatible with Cascading Flows.

I'm also the author of Cascading. But MapReduceFlow + Cascade in Cascading works quite well for most raw MR job chaining.


Cloudera has a workflow tool called Oozie that can help with this sort of chaining. Might be overkill for just getting one job to run after another.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜