A good approach to db planing for reporting service
The scenario:
Big system (~200 tables).
60,000 users. Complex reports that will require me to do multiple queries for each report and even those will be complex queries with inner queries all over the place + some processing in PHP.The approach:
I have seen an approach, which I am not sure about:
Having one centralized, de-normalized, table that registers any activity in the system which is reportable. This table will hold mostly foreign keys, so she should be fairly compact and fast. So, for example (My system is a virtual learning management system), A user enrolls to course, the table stores the user id, date, course id, organization id, activity type (enrollment). Of course I also store this data in a normalized DB, which the actual a开发者_StackOverflow中文版pplication uses.Pros: easy, maintainable queries and code to process data and fast retrieval.
Cons: there is a danger of the de-normalized table to be out of sync with the real DB.Is this approach worth considering, or (preferably from experience) is total $#%#%t?
You need to build a data warehouse instead of just one denormalized table. Search the web for information on the star schema, dimensions, levels, fact tables. Or better yet read this book Ralph Kimball's Data Warehousing Toolikit There were some used ones for like $1.77 lol. It's the fundamental datawarehouse design book - real life advice.
i use the same approach with you right now.
some time strictly normalized database will slow down query very much. and also it harder to query. that is very true, no one can denied this condition.
some big company (google, twitter, facebook) begin to leave relational database concept. they begin to use their own database concept with (may be) so many redundancy component. but in other hand, their concept resulting in easy, and very fast query.
i think your approach is fine while you can always ensure that every change of the database will also checked at application level.
best regards
Normalization is an academic concept. Very useful but useless to stick to it all the time. Transactions are the way to avoid inconsistencies. Make use of redundancy if it fulfills your needs of a simpler, more efficient query, like you can have one instead of 10+ tables.
精彩评论