What is the general feeling on efficient database shapes?
I have some rather large datasets and am using the hierarchical data module pytables. In desiging the databases, I'm wondering if it would be faster to create tables with many rows and only 1 value column, or in more common 'matrix' format.
For example:
from_cty, to_cty, value
austin, new york, 9000 dallas, new york, 8000 new york, dallas, 8400 au开发者_StackOverflow社区stin, dallas, 5030 dallas, austin, 4022 new york, austin, 8002 etc...--or--
Cities'''''''', austin, dallas, new york
austin''''''', 0000, 5030, 9000 dallas''''''', 4022, 0000, 8000 new york, 8002, 8400, 0000Benefits of the first type could include being able to pull a single column as once, selected by from or to cities. But it adds a column that would normally be column names.
Always start off with a normalized model having your primary usage of the table in mind.
Judging from the example data you have shown, it seems likely that additional cities will be added. If you go with your second example, that would require schema modifications, which is typically a bad thing.
If the need arises to query the data in the second format, you can always provide a view or even a permanent table that you periodically refresh. Or if your primary usage pattern require you to access the data in that format, you should of course model it accordingly.
Try it, measure it and make appropriate adjustments.
Providing that in this model "Austin-Dallas" is the same as "Dallas-Austin" you can prevent reverse-duplicates in the Distance
table by using
ALTER TABLE Distance ADD CONSTRAINT chk_id CHECK (FromCityId < ToCityId);
Then when querying you can use something like:
select
a.CityName as FromCity
, b.CityName as ToCity
, Value as Travel
from Distance as d
join City as a on a.CityId = d.FromCityId
join City as b on b.CityId = d.ToCityId
where (a.CityName = 'Austin' and b.CityName = 'Dallas')
or (a.CityName = 'Dallas' and b.CityName = 'Austin') ;
EDIT:
Just noticed that in your example "Austin-Dallas" is not the same as "Dallas-Austin", so no need for the check constraint and the second row in the WHERE
clause.
精彩评论