hadoop, map/reduce output file(part-00000) and distributed cache

2023-01-06 20:00 问答作者：

the value ouput from my map/reduce is a bytewritable array, which is written in the output file part-00000 (hadoop do so by default). i need this array for my next map function so i wanted to keep this array in distributed cache. can sombody tell how can i read from outputfile (part-00000) wh开发者_StackOverflowich may not be a text file and store in distributed cache.

My suggestion:

Create a new Hadoop job with the following properties:

Input the directory with all the part-... files.
Create a custom OutputFormat class that writes to your distributed cache.

Now make your job to look essentially to have something like this:

conf.setInputFormat(SequenceFileInputFormat.class);
conf.setMapperClass(IdentityMapper.class);
conf.setReducerClass(IdentityReducer.class);
conf.setOutputFormat(DistributedCacheOutputFormat.class);

Have a look at the Yahoo Hadoop tutorial because it has some examples on this point: http://developer.yahoo.com/hadoop/tutorial/module5.html#outputformat

HTH

hadoop, map/reduce output file(part-00000) and distributed cache

更多精彩内容

精彩评论

最新问答

u盘插电视上播放显示打开文件失败咋回事,类型都是mp4的?？

海信电视黑屏但有声音如何办？

电视怎么连接wifi？

华为将推出智慧屏S86Pro,搭载4K120Hz面板,屏幕质量怎么样?？

治不孕去那好？

问答排行榜

Escaping "<" in Perl-generated XML

Is it allowed to ask users to enter credit card details for own payment method?

imessage会显示已读吗？

微信重新建群怎么建？

Heroku and DB GUI