Cassandra setInputSplitSize is not working properly
I am using Hadoop + Cassandra. I use setInputSplitSize(1000) to not overload mappers (and receive out of heap memory) as default it is 64K. All 开发者_StackOverflow社区together I have only 2M lines to process. Actually every split should be ~1000 rows.
The problem is that some mappers still receive 64K rows and I do not why. Usually there are 2-3 mappers that have in status 4000% 64000% instead of just having 100%. When I check the log, I found 40K-64K rows processed. It is not crashing or giving out of memory, but these 2-3 tasks begin in the middle of processing and continue for 2-3 hours after all other have been processes.
Is it a normal behaviour? What can I do to make split size solid?
Thank you in advance!
What version of Cassandra are you using? If it's not 0.7.8 or 0.8.4 try upgrading first.
If you still see this behavior, please create a bug report on https://issues.apache.org/jira/browse/CASSANDRA.
精彩评论