Database table column datatype design question
Som开发者_Python百科etimes a piece data can be represented as an integer (4 bytes) or a string. Examples: phone numbers as bigint (8 bytes), zip codes as int(4), etc. We have a large table of index-value-description triplets as 3 columns, index being a 5-digit integer (not sequential), and we have it as int. A DBA is telling us that is a poor design, one should always use varchar to save such data, unless it can be a guaranteed integer like an auto-inc PK. Do you agree? Why or why not?
My rule of thumb is that if you do not intend to do math on it and it is not a surrogate key or fk of a surrogate key, it is string data. Phone numbers are not integers they are strings, same with postal codes (which are not alway numeric outside the US BTW). Number stored as strings often have and need leading zeros (see US zipcodes) storing them as an INT or decimal would not allow you to enter valid values. If it not being auto-generated, how do you know it is required to be integer data? If you are 100 percent positive, it should never be anything except an integer (and with no leading zeros), making it an int will prevent some bad data from getting in. But, be really really sure, you won't need to make it string data later (such as when you internationalize and find out your posal codes are no longer numbers).
To give better advice on what you are doing, I'd need a better example of the type of data you are talking about. Your table data needs are not entirely clear to me.
I would recommend against using a VARCHAR as an index column if all you need to do is represent an id. First, there is unnecesaary processing overhead in indexing VARCHAR. The overhead comes from the fact that a VARCHAR value must be transformed through a database COLLATION before comparision for indexing purposes. Second, there is no reason to have a variable length datatype - this makes for inefficient indexing. Third, you need up 5 times as much space to represent an integer as a VARCHAR than as an INT. this means that the index will be up to 5 times larger using a VARCHAR. Last, there is always a chance that numeric data will creep in to the column. This would hose your index and possibly break the referential integrity of the database.
If DB is oracle, DBA could be correct. VARCHAR2 type can be more efficient than INT for partition and indexes.
In addition, if you wouldn't do aggregation or something like that on INT field, there is no gain.
精彩评论