Including jar files in Hadoop streaming using Groovy
I love Hadoop streaming for it's ability to quickly pump out quick and dirty one off map reduce jobs. I also love Hroovy for making all my carefully coded java accessible to a scripting language. Now I'd like to put the 2 together. I'd like to take a jar with some of my java classes, and utilize these in groovy-based mappers and reducers.
Is there an easy way to do this? seems like this could be a major reduction in devel time for map reduce tasks, especially those that i'm just going to run a few times.
what i'd like is to do something like:
hadoop jar streaming.jar -mapper "groovy -ne 'import a.b.c.F开发者_如何学Pythonoo; println Foo.doSomething(line)' -reducer "wc -l" -input input -output output -jarstoinclude ~/jarWithJava.jar
any pointers how to do this?
If you need to add jars to your groovy classpath, you can put them in ~/.groovy/lib in each of your Hadoop nodes.
Or you can copy your jars to some directory in each of the nodes and specify them explicitly using the -cp flag for the groovy command.
You can add the jar to the class path by using the -libjar
attribute. Since groovy runs in the hadoop jobs jam, it should be able to find the classes.
精彩评论