开发者

Repository organization for Hadoop project

I am starting on a new Hadoop project that will have multiple hadoop jobs(and hence multiple jar files). Using mercurial for source control, I was wondering what would be optimal way of organizing the repository structure? Should each job live in separate repo or would it be开发者_JAVA百科 more efficient to keep them in the same, but break down into folders?


If you're pipelining the Hadoop jobs (output of one is the input of another), I've found it's better to keep most of it in the same repository since I tend to generate a lot of common methods I can use in the various MR jobs.

Personally, I keep the streaming jobs in a separate repo from my more traditional jobs since there are generally no dependencies.

Are you planning on using the DistributedCache or streaming jobs? You might want a separate directory for files you distribute. Do you really need a JAR per Hadoop job? I've found I don't.

If you give more details about what you plan on doing with Hadoop, I can see what else I can suggest.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜