What are the resources or tools used to manage temporal data in key-value stores?

2022-12-14 21:24 问答作者：

I'm considering using MongoDB or CouchDB on a project that needs to maintain historical records. But I'm not sure how difficult it will be to store historical data in these databases.

For example, in his book "Developing Time-Oriented Database Applications in SQL," Richard Snodgrass points out tools for retrieving the state of data as of a particular instant, and he points out how to create schemas that allow for robust data manipulation (i.e. data manipulation that makes invalid data entry difficult).

Are there tools or libraries out there that make it easier to query, manipulate, or define temporal/historical structures for key-value stores?

edit:

Note that from what I hear, the 'version' data that CouchDB stores is erased during normal use, and since I would need to maintain historical data, I don't think that's a viable solution.

P.S. Here's a similar question that was never answered: ke开发者_运维知识库y-value-store-for-time-series-data

There are a couple options if you wanted to store the data in MongoDB. You could just store each version as a separate document, as then you can query to get the object at a certain time, the object at all times, objects over ranges of time, etc. Each document would look something like:

{
    object : whatever,
    date : new Date()
}

You could store all the versions of a document in the document itself, as mikeal suggested, using updates to push the object itself into a history array. In Mongo, this would look like:

db.foo.update({object: obj._id}, {$push : {history : {date : new Date(), object : obj}}})

// make changes to obj
...

db.foo.update({object: obj._id}, {$push : {history : {date : new Date(), object : obj}}})

A cooler (I think) and more space-efficient way, although less time-efficient, might be to store a history in the object itself about what changed in the object at each time. Then you could replay the history to build the object at a certain time. For instance, you could have:

{
    object : startingObj,
    history : [
        { date : d1, addField : { x : 3 } },
        { date : d2, changeField : { z : 7 } },
        { date : d3, removeField : "x" },
        ...
    ]
}

Then, if you wanted to see what the object looked like between time d2 and d3, you could take the startingObj, add the field x with the value 3, set the field z to the value of 7, and that would be the object at that time.

Whenever the object changed, you could atomically push actions to the history array:

db.foo.update({object : startingObj}, {$push : {history : {date : new Date(), removeField : "x"}}})

Yes, in CouchDB the revisions of a document are there for replication and are usually lost during compaction. I think UbuntuOne did something to keep them around longer but I'm not sure exactly what they did.

I have a document that I need the historical data on and this is what I do.

In CouchDB I have an _update function. The document has a "history" attribute which is an array. Each time I call the _update function to update the document I append to the history array the current document (minus the history attribute) then I update the document with the changes in the request body. This way I have the entire revision history of the document.

This is a little heavy for large documents, there are some javascript diff tools I was investigating and thinking about only storing the diff between the documents but haven't done it yet.

http://wiki.apache.org/couchdb/How_to_intercept_document_updates_and_perform_additional_server-side_processing

Hope that helps.

I can't speak for mongodb but for couchdb it all really hinges on how you write your views.

I don't know the specifics of what you need but if you have a unique id for a document throughout its lifetime and store a timestamp in that document then you have everything you need for robust querying of that document.

For instance:

document structure:

{ "docid" : "doc1", "ts" : <unix epoch> ...<set of key value pairs> }

map function:

function (doc) {
  if (doc.docid && doc.ts)
    emit([doc.docid, doc.ts], doc);
  }
}

The view will now output each doc and its revisions in historical order like so:

["doc1", 1234567], ["doc1", 1234568], ["doc2", 1234567], ["doc2", 1234568]

You can use view collation and start_key or end_key to restrict the returned documents.

start_key=["doc1", 1] end_key=["doc1", 9999999999999]

will return all historical copies of doc1

start_key=["doc2", 1234567] end_key=["doc2", 123456715]

will return all historical copies of doc2 between 1234567 and 123456715 unix epoch times.

see ViewCollation for more details

继续阅读：data-structures sql

What are the resources or tools used to manage temporal data in key-value stores?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？