mysql database chained tables denormalization

2023-04-03 13:14 问答作者：

I have a design issue with db denormalization.

I'm making a relatively large database and need to optimize it a much as possible.

Here is a very simplified model of the issue.

All the tables from the picture a chained, and with a normalized database to get for example all the users from a specific country I have to join all the tables. That is cca 250 countries x cca 12000 cities x cca 625000 regions x ? addresses x ? users... In short that is a 开发者_如何学运维lot of joining, which takes a long time.

What I want to do, is to make the country_id redundant in the user table, so I can get the same query without any joining.

The question is, what is the best practice to keep consistency in such a model (btw. using MySql)?

One way and probably fastest is to ensure consistency on the application level, when inserting/updating/deleting data.

Other is stored procedures, which I really see no advantages. They ensure consistency only if directly called. The consistency breaks if you want to do some changes without the procedures.

I have also been looking at triggers... not really sure how to implement it, and how much would I gain in performance.

Anyway, it would be preferable to ensure consistency on the db level.

Any advice?

mysql database chained tables denormalization

All the tables from the picture a chained, and with a normalized database to get for example all the users from a specific country I have to join all the tables.

You have to join all the tables, because you use surrogate keys (id numbers), not because the tables are "normalized". Using surrogate keys like id numbers has nothing to do with normalization.

Natural keys and foreign key constraints is all you need to solve your problem.

The easiest way to see how this works is to start with full data, and work completely backwards. Assume all the data is correct.

addr_id  street           street_num       region    city          country
--
1        Babukiaeeva      3a               10000     Zagreb        Croatia
2        Riva             16               51000     Rijeka        Croatia 
3        Andrije Hebranga 2-4              10000     Zagreb        Croatia
4        Andrijeviaeeva   2               110000     Zagreb        Croatia

To record facts like "Region '10000' is associated with the city 'Zagreb' in the country 'Croatia'", create a new table, and populate it from this query.

SELECT DISTINCT region, city, country from addresses;

Table will look like this.

Table: regions
Primary key: {region, city, country}

region   city      country
--
10000    Zagreb    Croatia
51000    Rijeka    Croatia
110000   Zagreb    Croatia

Then set a foreign key reference.

ALTER TABLE addresses 
ADD CONSTRAINT FOREIGN KEY        (region, city, country) 
               REFERENCES regions (region, city, country);

To record facts like "City 'Zagreb' is in country 'Croatia'", create a new table, populate it from this query.

SELECT DISTINCT city, country from regions;

Table will look like this.

Table: cities
Primary key: {city, country}

city      country
--
Zagreb    Croatia
Rijeka    Croatia

Then set a foreign key reference.

ALTER TABLE regions 
ADD CONSTRAINT FOREIGN KEY       (city, country) 
               REFERENCES cities (city, country);

Repeat for countries. The tables countries, cities, and regions are all key, so they're in 5NF. (They can't have any non-key dependencies, because they have no non-key columns.) In the context of a large area, like all of Europe, it's likely that the table of addresses is also in 5NF.

It terms of query performance it's likely to run rings around your current schema, because it needs no joins.

You'll probably want to use ON UPDATE CASCADE; you might not want to cascade deletes, though.

First of all - is it really too slow? Have you tried it? Do you hava an app where you dump all users (why?) or are you fetching a single/few users now and then. Since you have primary keys on all those ids, the retrieval shouldn't be that slow, there is a B-tree in the background after all.

Secondly, I wouldn't normalize on the street number level, you'll hardly get any benefits from that, and you'll probably end up having an almost 1:1 relationship between user and an address. So, move your street number to the client, or you can even move the entire address table to the user. I would probably move the region table also to the client (these are city regions?) and end up with user, city and country tables.

Then you'd have two joins, if that is still too slow, you can put the redundant country key (or, since we're denormalizing - country name) in the users. I'd use triggers to maintain integrity, more precisely you'll have to write: insert and update triggers (update needs to fire only when regionId/cityId changes), and an update trigger on country (if you have country name in the user table) in the unlikely event that the country's name changes.

Performance-wise, you wouldn't gain, but lose with triggers, but I suppose that inserts and updates on the user table are not so frequent that you'd notice it at all.

Finally, since you didn't explain in detail the nature and size of you (web?) app, just a reminder that you might want to also consider/include other optimizations technologies outside the relational databases (caches, no-sql dbs, etc.).

It's not really a lot of joining assuming you are not trying to denormalize all your data.

On the other hand, most people just have an address table, or (gasp!) keep the address information in the user table. How many countries/ cities are you expecting to support, vs how many users?

继续阅读：database database-design denormalization sql

mysql database chained tables denormalization

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？