What is pre-distilled data or data aggregated at runtime, and why is MongoDB not good at it?

2023-03-20 09:27 问答作者：

What is an example of data that is "predistilled or aggregated in runtime"? (And why isn't MongoDB very good with it?)

This is a quote from the MongoDB docs:

Traditional Business Intelligence. Data warehouses are more suited 开发者_JS百科to new, problem-specific BI databases. However note that MongoDB can work very well for several reporting and analytics problems where data is pre-distilled or aggregated in runtime -- but classic, nightly batch load business intelligence, while possible, is not necessarily a sweet spot.

Let's take something simple like counting clicks. There are a few ways to report on clicks.

Store the clicks in a single place. (file, database table, collection) When somebody wants stats, you run a query on that table and aggregate the results. Of course, this doesn't scale very well, so typically you use...
Batch jobs. Store your clicks as in #1, but only summarize them every 5 minutes or so. When people want to query the summary table. Note that "clicks" may have millions of rows, but "summary" may only have a few thousand rows, so it's much quicker to query.
Count the clicks in real-time. Every time there's a click you increment a counter somewhere. Typically this means incrementing the "summary" table(s).

Now most big systems use #2. There are several systems that are very good for this specifically (see Hadoop).

#3 is difficult to do with SQL databases (like MySQL), because there's a lot of disk locking happening. However, MongoDB isn't constantly locking the disk and tends to have much better write throughput.

So MongoDB ends up being very good for such "real-time counters". This is what they mean by predistilled or aggregated in runtime.

But if MongoDB has great write throughput, shouldn't it be good at doing batch jobs?

In theory, this may be true and MongoDB does support Map/Reduce. However, MongoDB's Map/Reduce is currently quite slow and not on par with other Map/Reduce engines like Hadoop. On top of that, the Business Intelligence (BI) field is filled with many other tools that are very specific and likely better-suited than MongoDB.

What is an example of data that is "predistilled or aggregated in runtime"?

Example of this can be any report that require data from multiple collections.

And why isn't MongoDB very good with it?

In document databases you can't make a join and because of this it hard to build reports. Usually reports it's data aggregating from many tables/collections.

And since mongodb (and document database in general) good fit for data distribution and denormalization better to prebuild reports whenever it possible and just display data from this collection in runtime.

For some tasks/reports it is not possible to prebuild data, in this case mongodb give to you map/reduce, grouping, etc.

继续阅读：business-intelligence mongodb

What is pre-distilled data or data aggregated at runtime, and why is MongoDB not good at it?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？