SQL: best way to store yes/no values? Looking after performance in huge databases

2023-01-28 15:04 问答作者：

I have some columns where I have to store basically yes/no values. For example user status for active or inactive. Newsletter suscription status for suscri开发者_运维百科bed or unsuscribed.

Well I want to know (considering tables with a lot of records) if the best way is to put a tiny int with char length of 1 and set 1 for yes, and 0 for no.

Is this a correct thought? Or there are no impact in the performance of db queries when using just words like yes, no, active, inactive, suscribed, etc.

thanks in advance.

Semantically, I suggest you use bit if it's available to you. When looking at the column, any other developer can immediately determine that a boolean value is stored in it. If you don't have bit, try using tinyint. Ensuring that 1 is the only true value and 0 is the only false value will bring consistency. Otherwise, you could end up with a messy mixture of true/false, yes/no, valid/invalid, y/n, and/or t/f.

Comparing bit or tinyint values probably isn't slower than comparing strings, and even if it were slower than comparing strings, I can't imagine it having a significant effect on overall speed.

Is there something you don't like about the 'bit' data type?

The most commonly supported means is to use CHAR(1) - on most databases, it takes the same amount of space as BIT (assuming BIT is available, 1 byte) but supports more values (26 if case insensitive, 52 if not) if there's any chance of supporting more values. Unlike BIT, CHAR(1) is human readable. Also, BIT isn't supported on every database.

If your RDBMS supports bitmap indexes, go for BIT every time. If it doesn't, use whatever you want, there is really no difference between char(1), tinyint (byte).

Are you just asking in general, what the most efficient way to store a yes/no flag is? Or do you have a performance problem at hand?

If so, when do you have the performance problem (specific queries, inserts, maintenance etc)? What kind of performance gain are you looking for? 2%? 10%? 50%?

Changing datatypes will likely result in only a minor improvement unless we are talking about several hundred million rows. I will give you an example. Let's say that whatever change you did, you shaved of 3 bytes per row. Let's say the table contains 100,000,000 rows. That would be a saving of ~285 mb. Assuming the disk subsystem can provide you 100mb/s you have saved a whopping 3 seconds for a full table scan. Something tells me that the users would think 2 hours and 3 seconds vs 2 hours is same same :)

My intuition would have said performance would have been better with tinyints, but this post doesn't really bare that thought out. This SO post also offers some other interesting opinions.

I do think that performing analysis with data stored as numbers is typically easier than character data. What other programs are you going to have to interface with and use? For example, several of my analysis tools do not read character data at all, so we have to recode any data we receive in the format of "yes", "no", etc.

继续阅读：database database-design performance sql

SQL: best way to store yes/no values? Looking after performance in huge databases

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？