How are document-based datastores (e.g., Mongo) implemented vs a key-value store?
I've been reading a bit lately on document-based databases vs. key-value stores (Here's a good overview Difference between Document-based and Key/Value-based databases? ) and I'm having trouble finding good info on the following.
If we query either of these with the key (or an additional index), there's no real difference in the mechanics - get the value. I'm not clear on how a document store is that different from a key-value store when querying non-indexed documents/fields. If I were to implement a document store on top of a key-value store, I'd do a 'table scan' (check all key/value pairs) for the appropriate value in the query - do开发者_运维百科 document stores do more than this under the covers? Is it appropriate to think of document data stores in this fashion?
This is less of a practical question (would I use Mongo over a BDB if I needed to do something useful, most likely) than one aimed at understanding the underlying technology. I'm interested in the scaling aspects of particular systems only if they are applicable to the underlying implementation.
MongoDB and CouchDB use standard JSON (or BSON (spec)) to store data. They have optimized algorithms when you are querying for a particular value of an object and as far as my knowledge goes, they use Binary Trees for optimization with indexes (MongoDB certainly does). Using these, they can locate the data incomparably faster than searching in the values in a key-value pair database.
(From the key-value pair database implementations, Redis has a very interesting way of increasing performance where it stores the data on memory with few disk I/O.)
Edit:
Came by a great video in which the internals of the MongoDB is explained. Check it out.
All of them use BTree and hash indicies to speed up certain queries. The key value store is basically simply accessing the key which depending on the engine might be regarded as a single value (allowing selection and range queries) or as composite.
Document based engines add support for element paths within the document (or whatever they conceptionally call it). Basically you can emulate a key value store by creating a document {key, value} out of the key value. If you only use to query for documents using the key structure you basically have the same result and similar optimizations in terms of look up.
To find information about mongoDB's internals you might use their site and search for internals (https://www.mongodb.com/search?search=internals). Plenty of information can be found.
Interest on scalability means you have to carefully consider the usage scenario on the design. There are multiple variables to take into account for an scalable NonSQL deployment that spans whether the underlaying implementation is Key-based or Document-oriented. Here's a short list:
Aspects to take into account:
-Frequency of write vs read ops
-Need for data analysis
-Data redundancy for high availability
-Data replication / synchronization
-Need for many transient data
-Data size
-Cloud-ready
Some NonSQL implementations encourage better these aspects by separately than others.
Scenarios:
-Frequently-written, rarely read data like web hit counters, or data from logging devices: Redis | MongoDB
-Frequently-read, rarely written/updated: Memcached for transient data caching, Cassandra | HBase for searching, and Hadoop and Hive for data analysis
-High-availability applications which demand minimal downtime do well with clustered, redundant data stores: Riak | Cassandra
-Data synchronization across multiple locations: CouchDB
-Transient data (web sessions & caches) do well in transient key-value data stores: Memcached
-Big data arising from business or web analytics that may not follow any apparent schema: Hadoop
Conclusion:
IMHO you should focus the problematic of choosing an scalable data-store starting from the usage scenario instead of the underlaying aspects and differences between them.
I also recommend you to check Couchbase which is a nice combination of the two worlds: key-based and document-oriented.
精彩评论