shared variable in map reduce
I need a variable that shared between reduce tasks and each of reduce tasks ca开发者_运维百科n read and write on it atomically. The reason that I need such a variable is to give a unique identifier to each files that created by reduce task (number of files which created by reduce tasks is not deterministic).
Thanks
In my understanding ZooKeeper is specially built to maintain atomic access to the cluster wide variables.
I would recommend using FileSystem.createNewFile()
.
Have a look here:
http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/fs/FileSystem.html#createNewFile%28org.apache.hadoop.fs.Path%29
All the outout files produced by the reducers already have unique names part-r-00001 and such. There is a partition number you can read in case you need that number from your code.
Centralized counters that must be guaranteed unique break a lot of the scalability of Hadoop.
So if you need something different then I would use something like a Sha1 of the task id of the reducer to get something that is unique over multiple jobs.
精彩评论