开发者

Best practices for using Oozie for Hadoop

I have been using Hadoop quite a while now. After some time I realized I n开发者_高级运维eed to chain Hadoop jobs, and have some type of workflow. I decided to use Oozie , but couldn't find much of information about best practices. I would like to hear it from more experienced folks.

Best Regards


The best way to learn oozie is to download the examples tar file that comes with the distribution and run each of them. It has an example for mapreduce, pig , streaming workflow as well as sample coordinator xmls.

First run the normal workflows and once you debug that , move to running the workflows with coordinator so that you can take it step by step. Lastly one best practice would be to make most of your variables in workflow and coordinator be to configurable and supplied through a component.properties file so that you don't have touch the xml often.

http://yahoo.github.com/oozie/releases/3.1.0/DG_Examples.html


There are documents about Oozie on github and apache.

https://github.com/yahoo/oozie/wiki

http://yahoo.github.com/oozie/releases/3.1.0/DG_Examples.html

http://incubator.apache.org/oozie/index.html

Apache document is being updated and should be live soon.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜