Mixing column and row oriented databases?

2023-02-13 04:43 问答作者：

I am currently trying to improve the performance of a web application. The goal of the application is to provide (real time) analytics. We have a database mod开发者_StackOverflowel that is similiar to a star schema, few fact tables and many dimensional tables. The database is running with Mysql and MyIsam engine.

The Fact table size can easily go into the upper millions and some dimension tables can also reach the millions.

Now the point is, select queries can get awfully slow if the dimension tables get joined on the fact tables and also aggretations are done. First thing that comes in mind when hearing this is, why not precalculate the data? This is not possible because the users are allowed to use several freely customizable filters.

So what I need is an all-in-one system suitable for every purpose ;) Sadly it wasn't invented yet. So I came to the idea to combine 2 existing systems. Mixing a row oriented and a column oriented database (e.g. like infinidb or infobright). Keeping the mysql MyIsam solution (for fast inserts and row based queries) and add a column oriented database (for fast aggregation operations on few columns) to it and fill it periodically (nightly) via cronjob. Problem would be when the current data (it must be real time) is queried, therefore I maybe would need to get data from both databases which can complicate things.

First tests with infinidb showed really good performance on aggregation of a few columns, so I really think this could help me speed up the application.

So the question is, is this a good idea? Has somebody maybe already done this? Maybe there is are better ways to do it.

I have no experience in column oriented databases yet and I'm also not sure how the schema of it should look like. First tests showed good performance on the same star schema like structure but also in a big table like structure.

I hope this question fits on SO.

Greenplum, which is a proprietary (but mostly free-as-in-beer) extension to PostgreSQL, supports both column-oriented and row-oriented tables with high customizable compression. Further, you can mix settings within the same table if you expect that some parts will experience heavy transactional load while others won't. E.g., you could have the most recent year be row-oriented and uncompressed, the prior year column-oriented and quicklz-compresed, and all historical years column-oriented and bz2-compressed.

Greenplum is free for use on individual servers, but if you need to scale out with its MPP features (which are its primary selling point) it does cost significant amounts of money, as they're targeting large enterprise customers.

(Disclaimer: I've dealt with Greenplum professionally, but only in the context of evaluating their software for purchase.)

As for the issue of how to set up the schema, it's hard to say much without knowing the particulars of your data, but in general having compressed column-oriented tables should make all of your intuitions about schema design go out the window.

In particular, normalization is almost never worth the effort, and you can sometimes get big gains in performance by denormalizing to borderline-comical levels of redundancy. If the data never hits disk in an uncompressed state, you might just not care that you're repeating each customer's name 40,000 times. Infobright's compression algorithms are designed specifically for this sort of application, and it's not uncommon at all to end up with 40-to-1 ratios between the logical and physical sizes of your tables.

继续阅读：analytics column-oriented database-design infobright performance

Mixing column and row oriented databases?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？