How to read a file from HDFS in a non-Java client
So my MR Job generates a report file, and that file needs to be able to be downloaded by an end-user who needs to click a button on a normal web reporting interface, and have it download the output. According to this O'Reilly book excerpt, there is an HTTP read-only interface. It says it's XML based, but it seems that it's simply the normal web interface intended to be viewed through a web browser, not something that can be programatically queried, listed, and downloaded. Is my only recourse to write my o开发者_如何学编程wn servlet based interface? Or execute the hadoop cli tool?
The way to access HDFS programmatically from something other than Java is by using Trift. There are pre-generated client classes for several languages (Java, Python, PHP, ...) included in the HDFS source tree.
See http://wiki.apache.org/hadoop/HDFS-APIs
I'm afraid you will probably have to settle with the CLI AFAIK.
Not sure if it would fit your situation, but I think it would be reasonable to have whatever script that kicks off the MR job do a hadoop dfs -get ...
after job completion to a known directory that's served.
Sorry that I don't know of an easier solution.
精彩评论