开发者

How to use a binary executable which takes filenames as arguments in hadoop streaming?

Say I have a binary executable which takes filenames as arguments, like 'myprog file1 file2', it reads from file1 and writes t开发者_如何学Goo file2. The binary executable does not take stdin and does not emit stdout. How can I use this binary executable as a mapper or reducer in hadoop streaming? Thanks!


You would have to first save your data as a temporary file on local disk in order to use your program. Then you can read the results from the file.

However, this defeats the purpose of using Hadoop to process your data. The overhead of copying data to local disk and reading the results back into Hadoop-land would kill performance.

I would recommend making changes to your binary executable to allow i/o via stdin and stdout.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜