How do I avoid complex joins in star schema?

2023-01-05 14:05 问答作者：

My fact table holds a user score in a course he took. Some of the details of the course, which I have to show on the report, comes from more then one table (in the actual OLTP db).

Do I create a none normalize开发者_如何学编程d version of that course entry in a dimension table?

Or do I just join the fact table directly to the course table join to the other tables that describe this course (course_type,faculty who created this course etc)

Snowflaking or bridge tables do make the joins more complicated, and not just from a coding perspective, it also makes it less simple for BI users.

In most cases, I would put these directly in existing or additional dimension tables.

For instance, you have a scores fact table, which has the user details in a dimension which may or may not hold demographics on the user (perhaps it's only a bridge). Sometimes it is better to split out demographic information. So even though the gender and age might be associated with a user entity, in the dimensional model, these might be individual dimensions or lumped into a single dimension - all depending on the usage scenarios.

Perhaps your scores are attached to a state and states have regions (snowflake). It might be far more efficient for analysis to have the region dimension linked directly instead of going through the state dimension.

I think what you will find is that the dimensional model is a very pragmatic denormalization approach. The main things which are non-negotiable are the facts - after that the choice of dimensions is very much informed by the behavior of the data, your foresight for common usage scenarios - and avoiding falling into the too few dimensions and too many dimensions problems.

Maybe I do not understand your question, but a fact table in a star schema is supposed to be joined to dimension tables surrounding it. If you do not feel like making joins, simply create a view, and use the view for reporting.

If you were to post a model (schema), it would be easier to comment/help.

It is a common practice to consolidate several dimensions together, sacrificing normalization in favor of performance. This is usually done when your typical query will need all dimensions together (as opposed to using different bits for different use cases).

Also remember that while you receive a reduction in join overhead, there are some drawbacks:

Loss of flexibility, which might hinder development as the warehouse expands
Full table scans take longer (in traditional row-based RDBMS such as SQL Server)
Disk space consumption

You will have to consider each case separately.

It might be worthwhile to also consider the option of creating a materialized view, if such ability is offered by your RDBMS.

We commonly have a snowflake schema as the physical DWH design, but add a reporting view layer that flattens the snowflake schema into a star schema.

This way your OLAP cube becomes much simpler adn easier to manage.

继续阅读：data-warehouse join star-schema

How do I avoid complex joins in star schema?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？