Is there a way to "set" Hadoop Counter instead of incrementing it?
API only provides methods to increase a counter in Mapper or Reducer. Is there a way to just set it? or increment it's value only once irrespective 开发者_如何转开发of the number of times mappers and reducers are run.
What are you trying to achieve? This is inherently tricky, as what if multiple mappers try to set the counter? Who should win? The reason counters typically are only incremented is that this can be done very, very quickly and efficiently by the architecture.
You can't set the counter because the counters are summed from each of the tasks and aggregated into a top-level counter.
I have used ZooKeeper within MapReduce jobs for small communications or coordinations between tasks or flagging certain things that happened in a job or task.
This cannot be done from the Hadoop API at least as pointed out by @orangeoctupus as well. The approach I used for achieve this was to set the value in Job's Context properties. In the end the properties can be read after the job is run. Non-elegant but a workaround!
The interface org.apache.hadoop.mapreduce.Counter defines a method setValue, but if it works globally like it seems to based upon the description, I would agree with other answers that there aren't many use cases for it that are also good ideas...
精彩评论