开发者

Can Lua talk to Hadoop?

Can I use t开发者_如何学Che Lua programming language for Hadoop?

If so, how?


Absolutely :) You can use Hadoop streaming like this:

Create mapper and/or reducer scripts in lua that read from stdin:

#!/usr/bin/env lua
while true do
  local line = io.read()
  if line == nil then break end

  # Do something with the incoming row

end

And then run your job like:

$HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
    -input myInputDirs \
    -output myOutputDir \
    -mapper myMapper.lua \
    -reducer myReducer.lua \
    -file /local/path/to/myMapper.lua
    -file /local/path/to/myReducer.lua

Here, you specify your mapper and reducer scripts using -mapper and -reducer and ship both scripts with -file to your distributed cache, so all task trackers have access to it.

When running with streaming, you need to make sure that lua is available on all the machines that run task trackers.

Some time ago, we experimented using luajit (which is horribly fast) for streaming from Pig. If you use Pig, you can do something like:

 OP = stream IP through `/local/path/to/script`; 

This is not the same as using lua as a mapper or reducer, but depending on where your operation happens, the output from the mapper or reducer is streamed through the script.


I've never used Lua, nor the streaming side of Hadoop - So this is merely a suggestions, not sure if it will work:

Take a look at http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ and use Lua inplace of Python?

If I was going to attempt to do what you are asking, that would be my starting point.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜