Location dimension table
Creating a location dimension table for the DW; I am familar with date/time but for location I am using these columns: Continent, Country, Region, City, Postal. Now the question is what if a record doesn't have a city or postal and ends with a region only. In the DW all records will have a city but in the fact table there may be data which won't have these as these are non required data points at survey collection, so how to go about designi开发者_StackOverflowng this table? Do i need to first insert rows for only countinent, only country, only city, only postal, etc while keeping the rest blank then building the relationships like continent to country, etc?
Two basic ways to deal with this.
Use unknown for missing values. So each city has an unknown postal code, each region has an unknown city. This way a location which ends with a
region
, hasCity='unknown' , Postal='unknown'
Simply use only columns that exist in all records -- in this case drop
city
andpostal
columns.
I have made a few location dimensions in my life and I am managing currently a system with a large location dimension. I have described how I made it in my blog. https://dimensionalmodelingblog.wordpress.com/creating-a-location-dimension-in-a-data-warehouse/
Location dimension is tricky, and Even Ralph Kimball recognizes that it is a challenge (see chapter 10 of Building the Data Warehouse).
In you case, you actually need 5 dimensions, one for each level and its levels above (one dimension for Continent, Country, Region, City, Postal, one for Continent, Country, Region, City, etc.) When you have data which has no city information, you use the region dimension, etc.
Instead of making 5 separate tables, I suggest to make everything in one table and create views on that table so that you maintain only one location dimension.
Your table would look like this Continent, Country, Region, City, Postal, Level1Flag, Level2Flag, Level3Flag, Level4Flag, Level5Flag
Your process flags all entries of the right level to its value, and the first entry of each level to the next level's value: Example you have 15 cities in the US Region of Colorado, each of them is flaged Level 4, and the first one is flaged Level3 Then your LocationCity view shows the first 4 columns and filters on the Level4Flag, and your LocationRegion view shows the first 3 columns and filters on the Level3Flag.
Then you have the best of both: one dimension table to maintain and 5 role views that operate like mini dimensions.
@Darmir's solution is interesting, its big plus is that it keeps the geographic data in one table, the downside is that you get a very large number of records with 'unknown' for Continent, Country, Region, City, Postal combinations - either generated on the fly during an ETL, or as a one off load (if it can be done definitively).
Obviously there is a natural hierarchy here, so we would want to try and make use of it.
But alternatively, I think it might be interesting to build a number of dimension tables instead of just one. In the worst case, you might have your Fact table with surrogate keys to each of DimContinent, DimCountry, Dimregion, DimCity and DimPostCode. But it might be possible with some profiling to group these tables together sensibily. Consider the folliowing questions...
- are there any fields that are (always/generally) filled in?
- are there sets of fields that if one is filled in then the others will be too?
- can you get some definitive reference data to enhance and fill in your missing data?
Following (2) you might find that if country is filled in then Continent is filed in, otherwise both are Unknown. This would then naturally suggest a DimCountry table which contains both these fields.
You say "In the DW all records will have a city", so if you could find a way to enrich your data (step 3) then you could produce a a DimCity table that has (Continent/Country/Region/City).
When you expose these separate Dimensions in the cube you will be able to build them into a hierarchy and then be able to use your hierarchy easily there.
Im not completely convinced in myself of this solution, but thought I would throw it forward in case it was of assistance.
精彩评论