Idle hadoop master - how to make it do some work?
I have launched a small cluster of two nodes and noticed that the master stays completely idle while the slave does all the work. I was wondering what is the way to let master run some of the tasks. I understand that for a larger cluster having a dedicated master may be necessary but on a 2-node cluster it seems an overkill.
Thanks for any tips,
Vaclav
Some more details:
The two boxes have 2 CPUs each. The cluster has been set up on Amazon Elastic MapReduce but I am running hadoop from commandline.
The cluster I just tried it on has:
Hadoop 0.18
java version "1.6.0_12"
Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
Java HotSpot(TM) Server VM (build 11.2-b01, mixed mode)
hadoop jar /home/hadoop/contrib/streaming/hadoop-0.18-streaming.jar \
-jobconf mapred.job.name=map_data \
-file /path/map.pl \
-mapper "map.pl x aaa" \
-reducer NONE \
-input /data/part-* \
-output /data/temp/mapped-data \
-jobconf mapred.output.compress=true
where the input consists of 18 files.开发者_C百科
Actually hadoop master is not the one doing work (tasks you run). You can start datanode and tasktracker on the same machine the master runs.
Steve Loughran on the hadoop-users list suggested that starting a tasktracker on the master would do the trick.
$ bin/hadoop-daemon.sh start tasktracker
Seems to work. You may want to adjust number of slots for this tasktracker.
It may be different for Hadoop 0.18 but you can try adding the IP address of the master to the conf/slaves file - then restart the cluster
精彩评论