Integer vs char for DB record property vs Wordpress schema

2023-02-02 09:24 问答作者：

I asked a similar question before (integer-vs-char-for-db-record-property) but stumbled upon something that goes against all recommendations that I received in my previous post. In Wordpress 3, the most popular and mature open source blog script, post status is stored as VARCHAR(20) in db - 'publish', 'auto-draft', 'inherit', 'pending', etc. and not as INT with lookup table or mapped string constants, or CHAR, or anything like that. This also applies to the field post_type ('post', 'attachment', 'revision', etc.) and some other fields. So to find all published posts I need to run something like SELE开发者_如何转开发CT * FROM posts WHERE post_status = 'published' AND post_type = 'post'. Also, there is a multiple column index on post_status, post_type and some other columns which certainly speeds up this kind of search. Can someone explain why they made it this way and not the other, and what are the benefits and drawbacks of this approach?

Just because some application is well known doesn't mean they had a good database design. This tends to violate the normalization rules. Maybe they get better performance and maybe they didn't look at the other possibilities when they chose this one becasue they didn;t know better. Maybe they were aplication programmers designing a database without understanding database theory very well or maybe it was a deliberate denormailzation with performance stats to back it up. Or maybe they didn't think the chances of having to update 100 million records when we decided we wanted to change the value from 'published' to something else. Maybe they only tested performance on selects but not on updates. Maybe the values genuniely are unlikey to change, so it's not such a big deal to denormalize. We can't know from here.

Normalization is not about replacing strings with numbers or "sharing" strings just because they have the same letters.

I don't know their design, but the following scenario is perfectly normalized even though it uses strings as identifiers.

create table post_statuses(
   status varchar(20) not null
  ,primary key(status)
);

insert into post_statuses values('publish');
insert into post_statuses values('inherit');
insert into post_statuses values('pending');

create table posts(
   post_id ...
   status varchar(20) not null
  ,primary key(post_id)
  ,foreign key(status) references post_statuses(status)
);

The main benefit of using natural keys over surrogate keys is that it reduces the number of joins needed and also the likelyhood that entire classes of queries can be answered from index only. The main drawbacks are increased storage and the possibility of having a hell if we need to change the values.

I'd guess that the WP devs simply avoided what they felt was premature optimization, and opted for better readability instead.

"SELECT * FROM posts WHERE post_status = 'published' AND post_type = 'post'"

is a little little bit easier to read than

"SELECT * FROM posts WHERE post_status = ".WP_POST_STATUS_PUBLISHED."
    AND post_type = ".WP_POST_TYPE_POST.""

And when a new WP developer runs a select * from ... query, the database table lists 'published' rather than 3 or 5, which is easier to understand and debug.

From the point of view of disk storage space, either approach is fairly okay I think — some more post_status bytes shouldn't matter much in comparison to the blog post text and all other columns. An integer is 8 bytes (well unless it's a tinyint) and 'published' is perhaps 10 bytes so doesn't matter much?

继续阅读：database database-design sql wordpress

Integer vs char for DB record property vs Wordpress schema

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？