What's the database performance improvement from storing as numbers rather than text?

2023-02-13 14:54 问答作者：

Suppose I have text such as "Win", "Lose", "Incomplete", "Forfeit" etc. I 开发者_JAVA百科can directly store the text in the database. Instead if use numbers such as 0 = Win, 1 = Lose etc would I get a material improvement in database performance? Specifically on queries where the field is part of my WHERE clause

At the CPU level, comparing two fixed-size integers takes just one instruction, whereas comparing variable-length strings usually involves looping through each character. So for a very large dataset there should be a significant performance gain with using integers.

Moreover, a fixed-size integer will generally take less space and can allow the database engine to perform faster algorithms based on random seeking.

Most database systems however have an enum type which is meant for cases like yours - in the query you can compare the field value against a fixed set of literals while it is internally stored as an integer.

There might be significant performance gains if the column is used in an index.

It could range anywhere from negligible to extremely beneficial depending on the table size, the number of possible values being enumerated and the database engine / configuration.

That said, it almost certainly will never perform worse to use a number to represent an enumerated type.

Don't guess. Measure.

Performance depends on how selective the index is (how many distinct values are in it), whether critical information is available in the natural key, how long the natural key is, and so on. You really need to test with representative data.

When I was designing the database for my employer's operational data store, I built a testbed with tables designed around natural keys and with tables designed around id numbers. Both those schemas have more than 13 million rows of computer-generated sample data. In a few cases, queries on the id number schema outperformed the natural key schema by 50%. (So a complex query that took 20 seconds with id numbers took 30 seconds with natural keys.) But 80% of the test queries had faster SELECT performance against the natural key schema. And sometimes it was staggeringly faster--a difference of 30 to 1.

The reason, of course, is that lots of the queries on the natural key schema need no joins at all--the most commonly needed information is naturally carried in the natural key. (I know that sounds odd, but it happens surprisingly often. How often is probably application-dependent.) But zero joins is often going to be faster than three joins, even if you join on integers.

Clearly if your data structures are shorter, they are faster to compare AND faster to store and retrieve.

How much faster 1, 2, 1000. It all depends on the size of the table and so on.

For example: say you have a table with a productId and a varchar text column.

Each row will roughly take 4 bytes for the int and then another 3-> 24 bytes for the text in your example (depending on if the column is nullable or is unicode)

Compare that to 5 bytes per row for the same data with a byte status column.

This huge space saving means more rows fit in a page, more data fits in the cache, less writes happen when you load store data, and so on.

Also, comparing strings at the best case is as fast as comparing bytes and worst case much slower.

There is a second huge issue with storing text where you intended to have a enum. What happens when people start storing Incompete as opposed to Incomplete?

having a skinner column means that you can fit more rows per page.

it is a HUGE difference between a varchar(20) and an integer.

继续阅读：database performance sql

What's the database performance improvement from storing as numbers rather than text?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？