开发者

What's the database performance improvement from storing as numbers rather than text?

Suppose I have text such as "Win", "Lose", "Incomplete", "Forfeit" etc. I 开发者_JAVA百科can directly store the text in the database. Instead if use numbers such as 0 = Win, 1 = Lose etc would I get a material improvement in database performance? Specifically on queries where the field is part of my WHERE clause


At the CPU level, comparing two fixed-size integers takes just one instruction, whereas comparing variable-length strings usually involves looping through each character. So for a very large dataset there should be a significant performance gain with using integers.

Moreover, a fixed-size integer will generally take less space and can allow the database engine to perform faster algorithms based on random seeking.

Most database systems however have an enum type which is meant for cases like yours - in the query you can compare the field value against a fixed set of literals while it is internally stored as an integer.


There might be significant performance gains if the column is used in an index.


It could range anywhere from negligible to extremely beneficial depending on the table size, the number of possible values being enumerated and the database engine / configuration.

That said, it almost certainly will never perform worse to use a number to represent an enumerated type.


Don't guess. Measure.

Performance depends on how selective the index is (how many distinct values are in it), whether critical information is available in the natural key, how long the natural key is, and so on. You really need to test with representative data.

When I was designing the database for my employer's operational data store, I built a testbed with tables designed around natural keys and with tables designed around id numbers. Both those schemas have more than 13 million rows of computer-generated sample data. In a few cases, queries on the id number schema outperformed the natural key schema by 50%. (So a complex query that took 20 seconds with id numbers took 30 seconds with natural keys.) But 80% of the test queries had faster SELECT performance against the natural key schema. And sometimes it was staggeringly faster--a difference of 30 to 1.

The reason, of course, is that lots of the queries on the natural key schema need no joins at all--the most commonly needed information is naturally carried in the natural key. (I know that sounds odd, but it happens surprisingly often. How often is probably application-dependent.) But zero joins is often going to be faster than three joins, even if you join on integers.


Clearly if your data structures are shorter, they are faster to compare AND faster to store and retrieve.

How much faster 1, 2, 1000. It all depends on the size of the table and so on.

For example: say you have a table with a productId and a varchar text column.

Each row will roughly take 4 bytes for the int and then another 3-> 24 bytes for the text in your example (depending on if the column is nullable or is unicode)

Compare that to 5 bytes per row for the same data with a byte status column.

This huge space saving means more rows fit in a page, more data fits in the cache, less writes happen when you load store data, and so on.

Also, comparing strings at the best case is as fast as comparing bytes and worst case much slower.

There is a second huge issue with storing text where you intended to have a enum. What happens when people start storing Incompete as opposed to Incomplete?


having a skinner column means that you can fit more rows per page.

it is a HUGE difference between a varchar(20) and an integer.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜