开发者

Database table schema design - varchar(n). Suitable choice of N

Coming from a C background, I may be getting too anal about this and worrying unnecessarily about bits and bytes here.

Still, I cant help thinking how the data is actually stored and that if I choose an N which is easily factorizable into a power of 2, the database will be more efficient in how it packs data etc.

Using this "logic", I have a string field in a table which is a variable length up to 21 chars. I am tempted to use 32 instead of 21, for the reason given above - however now I am thinking that I am wasting disk space because there will be space allocated for 11 extra chars that are guaranteed to be never used. Since I envisage storing several tens of thousands of rows a day, it all adds up.

Question:

Mindful of all of the above, Should I declare varchar(21) or varchar(32) and why?

[Edit]

The data being stored conforms to an external specification, and can never be more than 21 chars long开发者_JS百科. I am using both mySQL and PostgreSQL, but ideally, I want the answer to be database agnostic, since I try to not get tied down by any particular vendor.


Let the database implementation do the optimization. Use the smallest size that makes sense for the application.

Performance is generally affected most by how many disk operations are necessary, and the smaller the data, the fewer the disk operations. Some databases will do compression or common prefix optimizations to keep the number of disk bytes used to a minimum.


If you allow the column to store more than 21 characters, then some day some joker (or maybe just a program bug) may come along and load a value that is more than 21 characters, and fun and games will ensue. If they can never store an invalid-length value in the talbe, then a query against the table will never return an invalid-length value.

Oh, and varchar(x) will require (x + 2) bytes of storage per row/column, with that extra 2 bytes indicating the length of the string actually being stored in that row/column.


Data is stored by row, so it's not just the length of this field that will determine where the boundaries come. And if the row isn't full, SQL is quite capable of leaving empty space. Let SQL Server do its job, and define the field length based on the business requirements.


I can only speak for SQL server, but if you are always going to have 21 characters you really should use char(21) instead of varchar(21). There are various reasons such as

  1. varchar uses 2 bytes of header per row extra storage
  2. Using char ensures all rows are equal length, meaning that it is slightly faster to locate data.
  3. varchar columns have additional offset overhead when doing inserts into the table

etc.


Simply define the maximum you'll possibly need.

See MSDN:

Storage size is the actual length in bytes of the data entered, not n bytes.

The n is defined just to not allow you enter more than n. It's a limit on the DB user.


What are the business rules on the field that you are trying to declare? If it never goes more than 21, go right ahead. But if you are not sure and the business wants you to have a leeway, then use 32.

Refer to this link


varchar(n) takes up only the length of the data stored in the column is less than n characters in length.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜