开发者

Hadoop mysql limiting the reducers

I'm using hadoop to update some records in a mysql db... The issue that I'm seeing is that in certain cases, multiple reducers are launched for the same key set. I've seen up to 2 reducers running on different slaves for the same key. This leads to the issue of both reducers updating the same record in the db.

I was thinking of turning off the autocommit mode to alleviate this issue.... but and doing the commit as part of the "cleanup" operation in the reducer, but was wondering what to do with the reducer(s) that lag behind...would the cleanup operation still be called for that...if so....is there a way to tell if the reducer finished normally or not, since I'd like to cal开发者_JS百科l "rollback" on the reducer(s) that didn't finish processing the data entirely?


You can add following Map Reduce Job property:

mapred.map.tasks.speculative.execution

with value as false. This will turn off speculative execution.


Two things:

  1. I really doubt that two (EQUAL) keys inside a reduce get partitioned to different slaves. Since HashPartitioner is used. You should override hashCode on your key class.
  2. You have the option to set the number of reduce tasks. It can be done with an API call to Job.setNumReduceTasks(X). Obviously you can set this to 1.


In general (without knowing your use case) it's usually preferable to avoid "Side Effect" with Hadoop. This is basically relying on a 3rd party system outside of Hadoop as it can bottleneck your performance and potentially topple the system over due to threading. I would recommend that you investigate Sqoop from Cloudera to do a batch load after the map-reduce job is complete. I have had good success using this as a bulk loader.

Sqoop Documentation

If you still would like to index directly from Hadoop. you can use the fair-scheduler to rate limit the number of mappers or reducers that can run at any time. Start the job with the mapred.queue.name set to your rate limited queue. You are looking for the maxMaps / maxReduces parameter.

Fair Scheduler Documentation

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜