How to estimate proper batch size number when processing DB records
So let's say I have a processor that takes records from one table transforms and place them into another table. I'm trying to figure how many records I can load into memory and process at once.
Obviously that would dep开发者_开发知识库end on a lot of factors: amount of data in the records, any BLOBS?, number of columns, database type, drivers, frameworks, how much memory available on a box, are there any other memory consuming process running in the same environment, etc.
Form tests I can see that it is able to process 10000 records at once but fails (with OOM) to query 100000.
I guess I'm looking for a good strategy to figure out proper batch size number.
Or should I just go with a very conservative and low batch size number like 100. And don't worry about any query overhead.
Thanks.
I would perform tests of different sizes until you don't see any improvement. You might see something like.
1 : 10 ms
10 : 5 ms each
100 : 3.5 ms each
1K : 3 ms each.
10K : 3.2 ms each.
In which case I would pick 1K or 100 to be on the low side.
Make the value hand-configurable, and run some benchmarks. Set the value to a sensible number. I've found i the past that performance increase becomes less and less as the number of batch records increase. So performance jump from 1 record at a time to 10 is dramatic, while from 10 to 100 is less so, and from 100 to 1000 is even less so, and so on.
So I would run benchmarks to find out what is reasonable, and then leave it configurable, you never know when you need to adjust it for some odd reason.
精彩评论