PostgreSql and retrieving real time business statistics leads to too long queries : a solution?

2023-01-15 02:15 问答作者：

We have a national application & the users would like to have accurate business statistics regarding some tables.

We are using tomcat, Spring Ws & hibernate on top of that.

We have thought of many solutions :

plain old query for each user request. The problem is those tables contains millions of records. Every query will take many seconds at least. Solution never used.
the actual solution used: create trigger. But it is painful to create & difficult to maintain (no OO, no cool EDI, no real debug). The only helping part is the possibility to create Junit Test on a higher level to verify the expected result. And for each different statistic on a table we have to create an other trigger for this table.
Using the quartz framework to consolidate data after X minutes.

I have learned that databases are not designedfor these heavy and complicated queries.

A separate data warehouse optimize for reading only queries will be better. (OLAP??) But I don't have any clue where to start with postGresql. (pentaho is the solution or just a part?)

How could we extract data from the production database ? Using some extractor ?
And when ?Every night ?
If it is periodically - How wil开发者_运维百科l we manage to maintain near real time statistics if the data are just dumped on our datawarehouse one time per day ?

"I have learn that databases are NOT DESIGNED for these heavy and complicated queries." Well you need to unlearn that. A database was designed for just these type of queries. I would blame bad design of the software you are using before I would blame the core technology.

I seems i have been misunderstood.

For those who think that a classic database is design for even processing real-time statistic with queries on billions datas, they might need to read articles on the origin of OLAP & why some people bother to design products around if the answer for performance was just a design question.

"I would blame bad design of the software you are using before I would blame the core technology." By the way, im not using any software (or pgadmin counts ?). I have two basic tables, you cant make it more simple,and the problem comes when you have billions datas to retreve for statistics.

For those who think it is just a design problm, im glad to hear their clever answer (no trigger i know this one) to a simple problem : Imagine you have 2 tables: employees & phones. An employee may have 0 to N phones. Now let say that you have 10 000 000 employees & 30 000 000 phones.

You final users want to know in real time :
1- the average number of phones per user
2-the avegarde age of user who have more than 3 phones
3-the averagae numbers of phones for employees who are in the company for more than 10 years

You have potentially 100 users that want those real time statistics at anytime.

Of course, any queries dont have to take more than 1/4 sec.

Incrementally summarize the data..? The frequency depends on your requirements, and in extreme cases you may need more hardware, but this is very unlikely.

Bulk load new data
Calculate new status [delta] using new data and existing status
Merge/update status
Insert new data into permanent table (if necessary)
NOTIFY wegotsnewdata
Commit

StarShip3000 is correct, btw.

继续阅读：postgresql

PostgreSql and retrieving real time business statistics leads to too long queries : a solution?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？