For millions of objects, is it better to store in an array or a database like redis if the objects are needed in realtime?

2023-03-21 06:09 问答作者：

I am developing a simulation in which there can be millions of entities that开发者_如何学Go can interact with each other. At the moment, all the entities are stored in a list. Would it be better to store the objects in a database like redis instead of a list?

Note: I assumed this was being implemented in Java (force of habit). My answer is not terribly useful if it is not Java.

Making lots of assumptions about your requirements, I'd consider Redis if:

You are running into unacceptable GC pauses as a result of your millions of objects OR
The entities you create can be reused across multiple simulation runs

Java apps with giant heaps and lots of long-lived objects can run into very long GC pauses, depending on work-load. i.e. the old gen fills up with all these millions of objects and they're never eligible for collection. Regardless, periodically a full collect will happen (unless you're a GC tuning master) and have to scan these millions of objects in the old gen. This can take many seconds each time it happens, and you're frozen during that time. If this is happening and you don't like it, you could off-load all these long-lived objects to Redis, and pay the serialize/deserialize cost of accessing them rather than the GC pauses.

On the other point about reusing entities: if you're loading up a big Redis db and then dropping all its data when the simulation ends, it feels a bit wasteful. If you can re-use entities across simulation runs you might save yourself a bunch of time by persisting them in Redis.

The best choice depends on a number of factors, including how you access data, whether it will fit in memory, and what the distribution of accesses looks like. As a broad generalization, keeping data in memory is always faster than on disk, and keeping it in-process is faster than keeping it elsewhere.

If your data fits in memory, is accessed in a manner that means you can use basic data structures like lists/arrays and hashtables efficiently, and all items are accessed roughly equally often, keeping your data in memory is probably the best option.

If your data fits in memory, but you need to access it in complex ways, you may be best choosing a datastore like redis that supports in-memory databases.

If your data doesn't fit in memory, or you have a very uneven access pattern such that evicting the least used data to disk might allow other things to be loaded, speeding up your task in general, a regular disk-based datastore may be a better choice.

A list is not necessarily the best data structure unless "interaction" is limited to the respective next or previous element. Random access (by index) is very slow on a list.
Lists rocket at inserting at front and end, and at finding the next (or previous) element, or inserting one in between. They totally blow for accessing element 164553 and then element 10657, being O(N) on random access. Thus "interact with each other" suggests that list is a bad choice.

It very much depends on the access and allocation patterns, but a vector or deque will likely be much better suited than a list for your simulation.

Redis is based on a hash table, which has a (much!) better characteristic for random access, but it will most likely still be slower, because it has considerable overhead for you serializing the data, it going through a socket, redis unserializing and analyzing it, sending a reply, and you parsing that.

继续阅读：algorithm arrays redis simulation storage

For millions of objects, is it better to store in an array or a database like redis if the objects are needed in realtime?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？