NoSql Referential Data
Disclaimer: by referential Data, i do not mean referential integrity
I am learning nosql and would like to understand how data should be modeled. In a typical relational database for an CMS application, for example, you may have two tables: article and author, where article have an reference to the author.
In nosql system, you may create an article document this way since they are just disguised object graph
{
title: "Learn nosql in 5 minutes",
slug: "nosql_is_easy",
author: {firstName: "Smarty"
lastName: "Pants"
}
{
title: "Death to RDBMS",
slug: "rdbms_sucks",
author: {firstName: "Smarty"
lastName: "Pants"
}
and so on...
Say one day Mr. Smarty Pants decided to change his name to Regular Joe because nosql has become ubiquitous. In such uses case, every article would need to开发者_如何学Python be scanned and the author's name updated.
So my questions is, how should the data be modeled in nosql to fit the basic use cases for an CMS so that the performance is on par or faster than RDBMS? mongodb, for example, claims CMS as an use-case ...
Edit:
Few people have already suggested normalizing the data like:
article
{
title: "Death to RDBMS",
slug: "rdbms_sucks",
author: {id: "10000001"}
}
author
{
name: "Big Brother",
id: "10000001"
}
However, since nosql, by design, lack joins, you would have to use mapreduce-like functions to bring the data together. If this is your suggestion, please comment on the performance of such operation.
Edit 2:
If you think nosql is not suitable solution for any kind of data that requires referential data, please also explain why. This would seem to make the use case for nosql rather limited since any reasonable application would contain relational data.
Edit 3:
Nosql doesn't mean non-relational
Your data is clearly relational: an article has an author. You can model your data in a NOSQL store like MongoDB in just the same way as you would in a relational store BUT because there are no joins in the database you have to make two calls to the database so you haven't gained anything.
BUT ... what you CAN do with a NOSQL store is to denormalize the data somewhat to get improved performance (a single round trip to get everything you need to display the article) BUT at the expense of immediate consistency: trading off always accurate author names for eventually accurate author names.
You might for example, use this in your article:
author: {firstName: "Smarty", lastName: "Pants", _id:DE342624EF }
Now you can display the article really fast and when someone does change their name you can either kick off a background task to update all the existing articles or you can wait for a periodic consistency sweep to fix it.
Many major web sites no longer give you immediate consistency. There are changes that you make that are only eventually seen by the other users on the site.
I suppose CouchDB is a NoSQL database, if you say so.
But really, we have general-purpose programming languages, and domain-specific languages. Similarly, CouchDB is a domain-specific database.
I use CouchDB a lot but I really don't care whether it uses SQL or NoSQL. CouchDB is valuable (to me) because the API is 100% HTTP, JSON, and Javascript. You can build web applications with the browser fetching HTML from CouchDB and then querying for data over AJAX. To say this is "not SQL" is an understatement!
Anyway, back to Smarty Pants and Regular Joe. Maybe he has 100,000 documents. What if we just updated them all, the hard way? Well, that is a pretty small amount of Javascript.
$.getJSON('/db/_design/cms/_view/by_user?key=Smarty+Pants', {
success: function(result) {
// Change the name right here, in the result objects.
var docs = result.rows.map(function(row) {
row.value.firstName = "Regular";
row.value.lastName = "Joe";
return row.value;
})
// Store it!
$.post('/db/_bulk_docs', {"docs":docs}, function() {
console.log("Done! Renamed Smarty Pants in " + docs.length + " documents!");
})
}
})
Yes, this technique would get you an F in computer science class. However, I like it. I would write this code in Firebug. In my browser. The rename is not atomic and it has no referential integrity. On the other hand, it would probably complete in a couple of seconds and nobody would care.
You might say CouchDB fails at the buzzwords and benchmarks but aces the school of hard knocks.
P.S. The by_user
view is built from map-reduce. In CouchDB, map-reduce is incremental which means it performs like most SQL indexes. All queries finish in a short, predictable (logarithmic) time.
for you specific case, use Flyweight pattern , store object id instead of object entity .
article
{
title: "Death to RDBMS",
slug: "rdbms_sucks",
author: {id: "10000001"}
}
author
{
name: "Big Brother",
id: "10000001"
}
for general mongodb Schema Design suggestion, read the offical documents
Allow me state that I am not an expert with NoSQL by any means. Instead, my knowledge of it is mostly theoretical.
That said, I'm a firm believer that implementing a CMS-type system like this in NoSQL is probably not the best way to go about things, since the data is primarily relational.
My take on this problem, is based on the presumption that the NoSQL system you're using allows for loading records by way of a "primary key"-type structure. I think most do, but I'm sure there are some that don't.
That said, I would suggest storing the data in the following fashion.
For the author:
{
_KEY: $AUTHOR_GUID,
firstName: "Smarty",
lastName: "Pants",
}
And for the post itself:
{
title: "Learn nosql in 5 minutes",
slug: "nosql_is_easy",
author: $AUTHOR_GUID,
}
Note that in the above, I use _KEY to represent that this is the "primary key"-type value.
After loading the post, you can then load the author by it's GUID.
This post has been here for some time, but I thought I'd point out another method to handle "joins" and cross-document references with CouchDB. This is a method I employ in the CMS I am (re)writing to use CouchDB (formerly it was written for MySQL).
The CMS is named BlueInk and can be found on Github at http://github.com/BigBlueHat/BlueInk At present the rewrite is focused on the document design and "rendering engine" portion, so there's no UI to speak of--you have to craft all the JSON by hand. That's something I hope to fix soon, but there's already enough in the repo (once installed into CouchDB) to give you an idea of how "joins" are done.
In BlueInk a page references content items which can themselves be included in one or more pages (or the same page multiple times). The page references the page items via their ID (as in your second JSON example). When run via the "page_and_items" view it will generate output that can be used with CouchDB's ?include_docs=true
query parameter to pull in the full contents of the content items references within the page document.
The view output is then passed through a _list
function and formatted via a Mustache template and output as an HTML page--all in a single GET request.
This same pattern of using referenced IDs with ?include_docs=true
can be used in your use case above. The use of a _list
function is completely "cosmetic," but it can be helpful for restructuring the output view JSON or templating it and outputting HTML, CSV, XML, etc.
You can model your data just fine with playOrm AND do joins in a noSQL store. playOrm has S-SQL(Scalable SQL) which is a twist on SQL in that you specify which partitions you are querying. In this way, you can move from DBMS to noSQL and still have the same familiar tools you were used too.
精彩评论