开发者

How does CouchDB perform for a regularly updated dataset?

I am planning on using CouchDB on a project. But as the querying mechanism involves writing views (which are a lot like indexes on regular RDMBMS's) I was wondering, if the document database keeps getting updated a lot ( a write heavy database) would CouchDB perform well co开发者_C百科mpared to a regular RDBMS? Or do we have to compact/re-index the system occasionally to make it perform faster?


You might think of the pros/cons of the CouchDB view model this way. (CouchDB hackers may disagree but IMO it's accurate enough for users.)

  1. A view function always performs a full "table scan" when it is first created (just like an RDBMS BTW)
  2. As long as they have no side effects, map and reduce functions can be arbitrarily complex
  3. Every document and map/reduce result is cached and never calculated again
  4. If you add or change a document, it (and only it) will be re-computed (and cached) for that view

Given these, you can draw some conclusions about CouchDB performance:

  • There is never a re-index phase for the entire data set, just incremental per document update
  • Changing a view function forces re-building the entire index
  • Since both CouchDB and RDBMS must update the index for new data, it's reasonable to think performance will be similar for heavy update/insert usage.

Obviously YMMV and the standard cop-out, "you must test your own load" applies. However I will add a few more considerations.

  • I say RDBMS is flat out superior for exploratory-style querying your data. When you don't even know what questions to ask from your data, you really can't beat a language for querying that is structured.
  • However, once you define what you want to know, CouchDB (and perhaps Hadoop) provide the most rich querying system because you are just writing code.
  • If your data set is large, NoSQL databases will scale more easily. For example, CouchDB-Lounge allows a cluster of couches for parallel processing. Hadoop does the same so then it would come down to secondary considerations: familiarity, maintainability, CouchDB is a web server but requires a bit more DIY; Hadoop internalizes more cluster management at the cost of complexity, foreignness, etc.

I hope that helps shed some light on your decision!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜