开发者

What is the general feeling on efficient database shapes?

I have some rather large datasets and am using the hierarchical data module pytables. In desiging the databases, I'm wondering if it would be faster to create tables with many rows and only 1 value column, or in more common 'matrix' format.

For example:

from_cty, to_cty, value

austin, new york, 9000

dallas, new york, 8000

new york, dallas, 8400

au开发者_StackOverflow社区stin, dallas, 5030

dallas, austin, 4022

new york, austin, 8002

etc...

--or--

Cities'''''''', austin, dallas, new york

austin''''''', 0000, 5030, 9000

dallas''''''', 4022, 0000, 8000

new york, 8002, 8400, 0000

Benefits of the first type could include being able to pull a single column as once, selected by from or to cities. But it adds a column that would normally be column names.


Always start off with a normalized model having your primary usage of the table in mind.

Judging from the example data you have shown, it seems likely that additional cities will be added. If you go with your second example, that would require schema modifications, which is typically a bad thing.

If the need arises to query the data in the second format, you can always provide a view or even a permanent table that you periodically refresh. Or if your primary usage pattern require you to access the data in that format, you should of course model it accordingly.

Try it, measure it and make appropriate adjustments.


What is the general feeling on efficient database shapes?

Providing that in this model "Austin-Dallas" is the same as "Dallas-Austin" you can prevent reverse-duplicates in the Distance table by using

ALTER TABLE Distance ADD CONSTRAINT chk_id CHECK (FromCityId < ToCityId);

Then when querying you can use something like:

select
      a.CityName as FromCity
    , b.CityName as ToCity
    , Value      as Travel
from Distance as d
join City     as a on a.CityId = d.FromCityId
join City     as b on b.CityId = d.ToCityId
where (a.CityName = 'Austin' and b.CityName = 'Dallas')
   or (a.CityName = 'Dallas' and b.CityName = 'Austin') ;

EDIT: Just noticed that in your example "Austin-Dallas" is not the same as "Dallas-Austin", so no need for the check constraint and the second row in the WHERE clause.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜