Why specify a length for character varying types

2023-04-03 03:25 问答作者：

Referring to the Postgres Documentation on Character Types, I am unclear on the point of specifying a length for character varying (varchar) types.

Assumption:

the length of string doesn't matter to the application.
you don't care that someone puts that maximum size in the database
you have unlimited hard disk space

It does mention:

The storage requirement for a short string (up to 126 bytes) is 1 byte plus the actual string, which includes the space padding in the case of character. Longer strings have 4 bytes of overhead instead of 1. Long strings are compressed by the system automatically, so the physical requirement on disk might be less. Very long values are also stored in background tables so that they do not interfere with rapid access to shorter column values. In any case, the longest possible character string that can be stored is about 1 GB. (The maximum value that will be allowed for n in the data type declaration is less than that. It wouldn't be useful to change this because with multibyte character encodings the number of characters and bytes can be quite different.

This talks about the size of string, not the size of field, (i.e. sounds like it will always compress a large string in a large varchar field, but not a small string in a large varchar field?)

I ask th开发者_如何学编程is question as it would be much easier (and lazy) to specify a much larger size so you never have to worry about having a string too large. For example, if I specify varchar(50) for a place name I will get locations that have more characters (e.g. Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch), but if I specify varchar(100) or varchar(500), I'm less likley to get that problem.

So would you get a performance hit between varchar(500) and (arbitrarily) varchar(5000000) or text() if your largest string was say 400 characters long?

Also out of interest if anyone has the answer to this AND knows the answer to this for other databases, please add that too.

I have googled, but not found a sufficiently technical explanation.

My understanding is that having constraints is useful for data integrity, therefore I use column sizes to both validate the data items at the lower layer, and to better describe the data model.

Some links on the matter:

VARCHAR(n) Considered Harmful
CHAR(x) vs. VARCHAR(x) vs. VARCHAR vs. TEXT
In Defense of varchar(x)

My understanding is that this is a legacy of older databases with storage that wasn't as flexible as that of Postgres. Some would use fixed-length structures to make it easy to find particular records and, since SQL is a somewhat standardized language, that legacy is still seen even when it doesn't provide any practical benefit.

Thus, your "make it big" approach should be an entirely reasonable one with Postgres, but it may not transfer well to other less flexible RDBMS systems.

The documentation explains this:

If character varying is used without length specifier, the type accepts strings of any size. The latter is a PostgreSQL extension.

The SQL standard requires a length specification for all its types. This is probably mainly for legacy reasons. Among PostgreSQL users, the preference tends to be to omit the length specification, but if you want to write portable code, you have to include it (and pick an arbitrary size, in many cases).

Two more thoughts:

The Postgres doc says that 'very long values are also stored in background tables'. Thus, defining all strings as unbounded likely pushes them into background tables -- for sure a performance hit.
Declaring everything as very long interferes with the DB's efforts to predict a query execution plan, because it has less knowledge of the data.
Building a b-tree to contain an index would also be thrown off because it would not be able to guess a reasonable packing strategy. For example if gender was TEXT, how would you know it's all only M or F?

继续阅读：database database-design postgresql types varchar

Why specify a length for character varying types

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？