Elastic Map Reduce External Jars
So, it is easy enough to handle external jars when using hadoop straight up. You have -libjars option that will do this for you. The question is how do you do this with EMR. There must be an easy way of doing it. I thought -cachefile option of the CLI would do it, but I couldn't get it working 开发者_如何学Gosomehow. Any ideas anyone?
Thanks for the help.
The best luck I have had with external jar dependencies is to copy them (via bootstrap action) to /home/hadoop/lib
throughout the cluster. That path is on the classpath of every host. This technique is the only one that seems to work regardless of where the code lives that accesses external jars (tool, job, or task).
One option is to have the first step in your jobflow set up the JARs wherever they need to be. Or, if they are dependencies, you can package them in with your application JAR (which is probably in S3).
FYI for newer versions of EMR /home/hadoop/lib is not used anymore. /usr/lib/hadoop-mapreduce should be used.
精彩评论