开发者

How can I get HBase to play nicely with sbt's dependency management?

I'm trying to get an sbt project going which uses CDH3's Hadoop and HBase. I'm trying to using a project/build/Project.scala file to declare dependencies on HBase and Hadoop. (I'll admit my grasp of sbt, maven, and ivy is a little weak. Please pardon me if I'd saying or doing something dumb.)

Everything went swimmingly with the Hadoop dependency. Adding the HBase dependency resulted in a dependency on Thrift 0.2.0, for which there doesn't appear to be a repo, or so it sounds from this SO post.

So, really, I have two questions: 1. Honestly, I don't want a dependency on Thrift because I don't want to use HBase's Thrift interface. Is there a way to tell sbt to skip it? 2. Is there some better way to set this up? Should I just dump the HBase jar in the lib directory and move on?

Update This is the sbt 0.10 build.sbt file that accomplished what I wanted:

scalaVersion := "2.9.0-1"

resolvers += "ClouderaRepo" at "https://repository.cloudera.com/content/repositories/releases"

libraryDependencies ++= Seq(
  "org.apache.hadoop" % "hadoop-core" % "0.20.2-cdh3u0",
  "org.apache.hbase" % "hbase" % "0.90.1-cdh开发者_运维知识库3u0"
)

ivyXML :=
  <dependencies>
    <exclude module="thrift"/>
  </dependencies>


Looking at the HBase POM file, Thrift is in the repo at http://people.apache.org/~rawson/repo. You can add that to your project, and it should find Thrift. I thought that SBT would have figured that out, but this is an intersection of SBT, Ivy and Maven, so who can really say what really should happen.

If you really don't need Thrift, you can exclude dependencies using inline Ivy XML, as documented on the SBT wiki.

override def ivyXML = 
  <dependencies>
    <exclude module="thrift"/>
  </dependencies>

Re: dumping the jar in the lib directory, that would be a short term gain, long term loss. It's certainly more expedient, and if this is some proof of concept you're throwing away next week, sure just drop in the jar and forget about it. But for any project that has a lifespan greater than a couple of months, it's worth it to spend the time to get dependency management right.

While all of these tools have their challenges, the benefits are:

  1. Dependency analysis can tell you when your direct dependencies have conflicting transitive dependencies. Before these tools, this usually resulted in weird runtime behavior or method not found exceptions.
  2. Upgrades are super-simple. Just change the version number, update, and you're done.
  3. It avoids having to commit binaries to version control. They can be problematic when it comes time to merge branches.
  4. Unless you have an explicit policy of how you version the binaries in your lib directory, it's easy to lose track of what versions you have.


I have a very simple example of an sbt project w/ Hadoop on github: https://github.com/deanwampler/scala-hadoop.

Look in project/build/WordCountProject.scala, where I define a variable named ClouderaMavenRepo, which defines the Cloudera repository location, and the variable named hadoopCore, which defines the specific information for the Hadoop jar.

If you go to the Cloudera repo in a browser, you should be able to navigate to the corresponding information for Hive.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜