开发者

How to read a file from HDFS in a non-Java client

So my MR Job generates a report file, and that file needs to be able to be downloaded by an end-user who needs to click a button on a normal web reporting interface, and have it download the output. According to this O'Reilly book excerpt, there is an HTTP read-only interface. It says it's XML based, but it seems that it's simply the normal web interface intended to be viewed through a web browser, not something that can be programatically queried, listed, and downloaded. Is my only recourse to write my o开发者_如何学编程wn servlet based interface? Or execute the hadoop cli tool?


The way to access HDFS programmatically from something other than Java is by using Trift. There are pre-generated client classes for several languages (Java, Python, PHP, ...) included in the HDFS source tree.

See http://wiki.apache.org/hadoop/HDFS-APIs


I'm afraid you will probably have to settle with the CLI AFAIK.

Not sure if it would fit your situation, but I think it would be reasonable to have whatever script that kicks off the MR job do a hadoop dfs -get ... after job completion to a known directory that's served.

Sorry that I don't know of an easier solution.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜