Data Structure To Store Arbitrary Database Tables

2023-01-10 18:09 问答作者：

I'd like to design a JVM data structure (Java/Scala) that can be used to represent and store the contents of arbitrary relational database tables. The data structure should be fast (not too gc-intensive, cache-friendly) and memory efficient, so larger tables can fit in RAM.

One memory-efficient solution is to store each column separately in a primitive array, but I'm worried about the cache friendliness because items in the same row are not stored together. A row with N columns will incur N cache misses, no matter how narrow the columns.

Another solution is to store each row in an object array where each element represents a field and is cast to the correct type on retrieval, but this requires storing numeric types in their boxed form, so it's not very memory-efficient. And it's probably not that cache efficient either.

Another solution is to layout each row's data into a byte array the same way real databases serialize their rows, using only as many bytes as necessary. This is cache-friendly and memory efficie开发者_运维技巧nt, but I'm concerned about the cost of serialization/de-serialization on every access.

What's the best way?

A fourth solution would be to store each row's data as strings instead of byte arrays. This may avoid serialization costs in most cases - provided that most data will be strings.

This will also be easier to debug and will be platform independent. Of course it has some limitations: e.g. a float can not be represented as-is, but may be stored in something similar to a SQL DECIMAL format.

Any solution will be a trade-off.

EDIT However, I would prefer the byte array solution for your case: one byte-array per row. This should be most cache-friendly for fixed-size rows. But then you should also provide a solution for variable-sized rows. A low-level language seems to fit that task better, in C one could define two formats: fixed size rows where the table metadata contains column-offsets (e.g. column 1: bytes 0..31, column 2: bytes 32..127 etc.), and a second variable size row format, where the rows itself contain the columns sizes (e.g. bytes 1..3 contain the size, the following number of bytes contain the data, then another 4 bytes contain the size, following data and so on).

What is the purpose of doing this? You are likely better simply storing the data that you retrieve from your database (as the objects you map it to) in some sort of caching layer like EhCache, OSCache, memcache, etc - rather than re-inventing the wheel.

Why not use hsqldb or h2?

They both support in-memory mode and are pure Java. They force you to use SQL to access but on the other end, you don't have to implement your own join.

Both are open source, so you can also use this as a baseline for performance and see if doing your own by column/by row data structure would be faster and be worth the effort.

继续阅读：data-structures relational scala

Data Structure To Store Arbitrary Database Tables

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？