Reading/Writing/Storing extremely large sets of sequential data

2023-02-05 19:45 问答作者：

I am interacting with large sequential sets of data in Java. Ideally, I'm searching for a library where I 开发者_如何学Pythoncan store streaming data (think sequences of immutable objects) and then jump around through the saved data later. The data should ultimately be stored on the disk and shouldn't be stored in memory in it's entirety. The data would be states of mathematical systems -- so predominately numbers (of doubles, or even BigDecimals) as well as some strings.

At the moment this is for a desktop application, so there would only be one user and maybe a few concurrent connections at a time (several streams of objects/states). Later I may consider a distributed approach and support for multiple clients on the same database backend.

I've been looking at various NoSQL libraries but I am not sure what's right for my needs. Any thoughts?

Take a look to OrientDB: for insertions is very very fast. On my notebook inserts 1,000,000 of entries in 6 seconds. Furthermore is Java and can run as embedded in your process.

If you have any means of calculating the offset for each object you want to access, a simple java.nio.MappedByteBuffer - the equivalent to mmap - might do the job.

If you have a 64-bit JVM you can memory map the files into memory. This will give you an up to 2 GB window into each file.

When you have multiple clients, you could have a server process which has access to the files or database and caches/distributes data to the clients.

Just use a binary file? Easy if your objects are equal in size; you can use random access to jump around in the file. Your operating system will use its disk cache to provide you caching for free. Sometimes people use a database and SQL interface as a golden hammer.

Have you looked at Berkeley DB Java Edition? It was designed for this type of use case in mind. Large data sets, high write throughput, reliable persistence with a set of very Java developer-friend APIs. You can use the Base API (key/value pairs), the Collections API or the JPA-like DPL (Direct Persistence Layer) API.

There's an excellent Getting Started Guide that has examples and explains the various APIs.

There are lots of similar use cases to yours. In fact, Terracotta and Coherence both use Berkeley DB for persistence. As does Heretix, the Internet Archive project, Tibco and many other companies and projects. The reason is that BDB provides the performance, reliability, scalability, flexibility and simplicity that they need.

Disclaimer: I'm one of the product managers for Berkeley DB, so naturally I'm biased. But your use case sounds exactly on target with what BDB was designed to do.

Good luck with your project. Please let us know if there is anything that we can help with. You can ask questions about Berkeley DB Java Edition on the OTN Forums, where you'll find a large community of active Java application developers.

Regards,

Dave

继续阅读：database nosql

Reading/Writing/Storing extremely large sets of sequential data

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？