Hadoop streaming grep does not work
Grep seems not to be working for hadoop streaming
For: hadoop jar /usr/local/hadoop-0.20.2/contrib/streaming/hadoop-0.20.2-streaming.jar -input /user/root/tmp2/user.data -output /user/root/selected_data -mapper '/bin/grep 1938678460' -reducer 'wc' -jobconf mapred.output.compress=false
I get: java.lang.RuntimeException: PipeMapRed.wa开发者_如何学运维itOutputThreads(): subprocess failed with code 1 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:17
Any idea?
I also tried: -mapper 'cat' -reducer '/bin/grep 1938678460' (cat works, grep does not)
....I also checked on all machines that /bin/grep is there and it is
Grep does not work , or I'm missing something?
I haven't tried this myself, but grep exits with a non-zero exit code if it didn't find something. If a map doesn't contain the string you grep for, you get a non-zero exit code and hadoop will error. Maybe something like "/bin/grep || true" works.
精彩评论