开发者

Including jar files in Hadoop streaming using Groovy

I love Hadoop streaming for it's ability to quickly pump out quick and dirty one off map reduce jobs. I also love Hroovy for making all my carefully coded java accessible to a scripting language. Now I'd like to put the 2 together. I'd like to take a jar with some of my java classes, and utilize these in groovy-based mappers and reducers.

Is there an easy way to do this? seems like this could be a major reduction in devel time for map reduce tasks, especially those that i'm just going to run a few times.

what i'd like is to do something like:

hadoop jar streaming.jar -mapper "groovy -ne 'import a.b.c.F开发者_如何学Pythonoo; println Foo.doSomething(line)' -reducer "wc -l" -input input -output output -jarstoinclude ~/jarWithJava.jar

any pointers how to do this?


If you need to add jars to your groovy classpath, you can put them in ~/.groovy/lib in each of your Hadoop nodes.

Or you can copy your jars to some directory in each of the nodes and specify them explicitly using the -cp flag for the groovy command.


You can add the jar to the class path by using the -libjar attribute. Since groovy runs in the hadoop jobs jam, it should be able to find the classes.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜