开发者

Differences between Data Vault and Dimensional modeling?

When modeling a data warehouse, is there any reason we should favor Data Vault over Dimensional modelling? 开发者_JS百科What are the major differences between these two?


Dimensional modelling is in my opinion still the best practise for analysis & reporting and as a visible model best understand by business users.

Data Vault is more suitable for large Enterprise Data Warehousing, also recommended by Bill Inmon, but not that suitable for analysis & reporting, for that you still might need dimensional modelling for creating you "virtual" Data marts. Take a peak at some blogs like those from Martijn Evers, Hennie de Nooijer or Ronald Damhof.

Data Vault is more flexible, easier to add new sources, more audit able and keeps all data all the time so you will be able to always recreate you DM's.

So a conclusion might be that the ideal situation is to use Data Vault for your Enterprise Data Warehouse and Dimensional Modelling for you Datamarts.


I think a combination of the two would best serve most large organizations. Vault would be a good choice for an intermediate enterprise ODS where less structure would facilitate flexibility and performance. Data can then be pulled from the Vault Db to feed context specific dimensional data marts that support reporting and analysis. In that scenario the vault Db can also be used to support more big-data types of mining and analysis that require a more mature understanding of data relationships.


@Danny Shaw this is also my experience (though I am relatively new in this field - coming from ETL, so curious to input from others on my post).

I believe that it is important, to respect that your Clients' demands are evolving with their 'maturity', and that different models may fit better at different times.

My feeling is that Data Vault delivers operational flexibility, whereas existing discussion (Kimball/Inmon) revolves more around 'business flexibility' (for lack of better terminology).

Data Vault allows you to stay close to the source in terms of its granular objects. This makes the model 'auditable' and scalable. It helps with flexibility on SOURCE specifications.

Therefore it is a useful in-between in e.g. migration projects, serving as a base from where to feed more business-oriented DWH/Datamarts that require an integrated view of both old and new. My experience however is that if you start populating Datamarts directly from this model you end up with lots of joins and especially also recursions simply because you are far from the business concepts. Not entirely bad on certain databases so the choice is partly influenced by the software (e.g. Teradata likes joining much more so than Oracle). However generally my feeling is that if you need flexibility in TARGET (business) side, you end up in the inmon-kimball discussion and it would not be a bad start to consider dimensional modeling instead of data vault on that side.

So part of the input in your evaluation should also be: how standardised are business concepts? Is the whole company using the same KPIs and Data concepts? If this is not the case, staying close to the source (especially if there are many) somewhere in your data warehouse seems like a safe bet to me. If more mature, prepare for more flexibility in reporting demands - and shift the performance of your datamodel to the reporting side.

This is not to say that business cannot be evolving - just that it has to be evolving as a whole. I consider this a more 'mature' customer, that knows what it can do with their data, has a very integrated and standardised view on their business, with more and more complex requirements in terms of reporting. So if you need to model for flexibility in feeding datamarts, AND you have a strong ETL toolset, you might as well directly set up your datamodel to resemble business a bit more.

To summarise, I would argue that as the BI environment becomes more 'mature', business has learnt what it can do with the data and the demands on that side become more complex. Data Vault would not be the way to go on that side.

However if you are in a migration (especially with years-long parallel phases), or in a younger organisation where not all departments look at their business through the same eyes, but (in your advantage) the reports requirements are rather overseeable, it would be an option to use data vault up front and try to see if you can feed your datamarts directly from that - possibly adding a taste of Kimball's dimensions somewhere in between.


Favouring any approach is usually a matter of balancing experience and opinion with the needs and requirements for the system. Each modelling approach has certain advantages when related to different situations, so you must evaluate the environment your model will interact with when figuring out which approach to take.

Highly transactional systems that add data frequently and uniformly usually suit a dimensional modelling approach. Common examples used to describe it normally focus on Retail and Financial organisations, as the number of sales or monetary transactions being added over time suits the Fact and Dimension concepts.


Why do you feel you need either of them? They are mostly jargon-heavy design patterns used to sell books and training courses. Millions of people find they can get on just fine without them. What you really need to design a data warehouse is the same good analysis and modelling skills you need for any database.

If you are seeking useful advice on building a data warehouse then check out Bill Inmon's books. If this is your first Business Intelligence project then get some help from someone with experience in the field so that you can avoid some of the common pitfalls.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜