开发者

How to estimate proper batch size number when processing DB records

So let's say I have a processor that takes records from one table transforms and place them into another table. I'm trying to figure how many records I can load into memory and process at once.

Obviously that would dep开发者_开发知识库end on a lot of factors: amount of data in the records, any BLOBS?, number of columns, database type, drivers, frameworks, how much memory available on a box, are there any other memory consuming process running in the same environment, etc.

Form tests I can see that it is able to process 10000 records at once but fails (with OOM) to query 100000.

I guess I'm looking for a good strategy to figure out proper batch size number.

Or should I just go with a very conservative and low batch size number like 100. And don't worry about any query overhead.

Thanks.


I would perform tests of different sizes until you don't see any improvement. You might see something like.

1 : 10 ms
10 : 5 ms each
100 : 3.5 ms each
1K : 3 ms each.
10K : 3.2 ms each.

In which case I would pick 1K or 100 to be on the low side.


Make the value hand-configurable, and run some benchmarks. Set the value to a sensible number. I've found i the past that performance increase becomes less and less as the number of batch records increase. So performance jump from 1 record at a time to 10 is dramatic, while from 10 to 100 is less so, and from 100 to 1000 is even less so, and so on.

So I would run benchmarks to find out what is reasonable, and then leave it configurable, you never know when you need to adjust it for some odd reason.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜