Optimizing processing and management of large Java data arrays

2023-02-02 23:44 问答作者：

I'm writing some pretty CPU-intensive, concurrent numerical code that will process large amounts of data stored in Java arrays (e.g. lots of double[100000]s). Some of the algorithms might run millions of times over several days so getting maximum steady-state performance is a high priority.

In essence, each algorithm is a Java object that has an method API something like:

   public double[] runMyAlgorithm(double[] inputData);

or alternatively a reference could be passed to the array to store the output data:

   public runMyAlgorithm(double[] inputData, double[] outputData);

Given this requirement, I'm trying to determine the optimal strategy for allocating / managing array space. Frequently the algorithms will need large amounts of temporary storage space. They will also take large arrays as input and create large arrays as output.

Among the options I am considering are:

Always allocate new arrays as local variables whenever they are needed (e.g. new double[100000]). Probably the simplest approach, but will produce a lot of garbage.
Pre-allocate temporary arrays and store them as final fields in the algorithm object - big downside would be that this would mean that only one thread could run the algorithm at any one time.
Keep pre-allocated temporary arrays in ThreadLocal storage, so that a thread can use a fixed amount of temporary array space whenever it needs it. ThreadLocal would be required since multiple thre开发者_运维知识库ads will be running the same algorithm simultaneously.
Pass around lots of arrays as parameters (including the temporary arrays for the algorithm to use). Not good since it will make the algorithm API extremely ugly if the caller has to be responsible for providing temporary array space....
Allocate extremely large arrays (e.g. double[10000000]) but also provide the algorithm with offsets into the array so that different threads will use a different area of the array independently. Will obviously require some code to manage the offsets and allocation of the array ranges.

Any thoughts on which approach would be best (and why)?

What I have noticed when working with memory in Java is the following. If your memory needs patterns are simple (mostly 2-3 types of memory allocations) you can usually be better than the default allocator. You can either preallocate a pool of buffers at the application startup and use them as needed or go to the other route (allocate an huge array at the beginning and provide pieces of that when needed). In effect you are writing your own memory allocator. But chances are you will do a worse job than the default allocator of Java.

I would probably try to do the following: standardize the buffer sizes and allocate normally. That way after a while the only memory allocation/deallocation will be in fixed sizes which will greatly help the garbage collector to run fast. Another thing I would do is to make sure at the algorithm design time that the total memory needed at any one point will not exceed something like 80-85% of the memory of the machine in order to not trigger a full collection inadvertently.

Apart from those heuristics I would probably test the hell of any solution I would pick and see how it works in practice.

Allocating big arrays is relatively cheap for the GC. You tend to use you your Eden space quickly, but the cost is largely per object. I suggest you write the code in the simplest manner possible and optimise it later after profiling the application. a double[100000] is less than a MB and you can over a thousand in a GB.

Memory is a lot cheaper than it used to be. An 8 GB server costs about £850. A 24 GB server costs about £1,800. (a 24 GB machine could allow you 24K x double[100000]) You may find using a large heap size or even a large Eden size gives you the efficiency you want.

继续阅读：arrays memory-management optimization performance

Optimizing processing and management of large Java data arrays

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？