开发者

External store for complex collections that can be accessed by Key-Value

Problem

I need a key-value store that can store values of the following form:

DS<DS<E>>

where the data structure DS can be either a List, SortedSet or an Array

and E can be either a String or byte-array.

It is very expensive to generate this data and so once I put it into the store, I will only perform read queries on it. Essentially it is a complex object cache with no eviction.

Example Application

A (possibly bad, but sufficient to clarify) example of an application is storing tokenized sentences from a document where you need to be able to quickly access the qth word of the pth sentence given documentID. In this case, I would be storing it as a K-V pair as follows:

K - docID
V - List<List<String>>
String word = map.get(docID).get(p).get(q);

I prefer to avoid app-integrated Map solutions (s开发者_如何学Gouch as EhCache within Java).

I have worked with Redis but it doesn't appear to support the second layer of data-structure complexity. Any other K-V solutions that can help my use case?


Update:

I know that I could serialize/deserialize my object but I was wondering if there is any other solution.


In terms of platform choice you have two options - A full document database will support arbitrarily complex objects, but won't have built in commands for working with specific data structures. Something like Redis which does have optimised code for specific data structures can't support all possible data structures.

You can actually get pretty close with Redis by using ids instead of the nested data structure. DS1<DS2<E>> becomes DS1<int> and DS2<E>, with the int from DS1 and a prefix giving you the key holding DS2.

With this structure you can access any E with only two operations. In some cases you will be able to get that down to a single operation by knowing what the id of DS2 will be for a given query.


I hesitate to "recommend" it, but one of the only storage engines I know of which handles multi-dimensional data of this sort efficiently is Intersystems Cache. I had to use it at my last job, mostly coding against it using it's built in MUMPS-based language. I would not recommend the native approach, unless you hate yourself or your developers. However, they do have decent Java adapters, which appears to be what you're using. I've seen it handle billions of records, efficiently stored in nested binary tree tables. There is no practical limit to the depth (number of dimensions) you can use. However, this is very much a proprietary solution. There is an open-source alternative called GT.M, but I don't know how compatible it is with languages that aren't M or C.


Any Key-Value store supports complex values, you just need to serialize/deserialize the data.

If you want fast retrieval only for specific parts of the data, you could use a more complex Key. In your example this would be: K - tuple(docID, p, q)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜