开发者

Is there a legitimate reason for using so many varchar fields? (MS SQL DB)

I'm working on data migration from an old IBM Universe-based system to a new enterprise-level data-information management system, and learning about database design in the process.

I took a look at the back-end database structure of the new system (it's a MS SQL DB, with about 100 tables), and find some things pretty odd. But I don't know if my inexperience is the reason I think that and this is just standard practice, or if these oddities are really just bad database/application design.

For example:

  • Some date fields are varchar(20)
  • Fields that store measurements are varchar(50), rather than something like a decimal and an enum to store the units of mea开发者_如何学Pythonsurement
  • ISBN 10 & 13 number fields are varchar(50)
  • Some look-up ID foreign keys are varchar(100), even though the actual look-up table primary key is an int
  • Some fields are varchar(0)
  • Additional separate fields for storing month & year, each of which is varchar(250) - I don't know what kind of design decision would require a maximum of 250 characters for the year, unless they really went overkill on their Y2K compliancy, or decided to use seconds since the beginning of the universe to store datetime

And plenty of others. The DB looks to be more than half varchar fields.

I should also mention that all the varchar fields in the DB are actually n-varchar - so it's all unicode, even the fields that only store numbers.

Is there any legitimate argument that using so many varchar fields could be the best option, in some circumstances? (flexibility... maybe...?)


It does seem odd, but it really depends on how the data is used. There could be very good reasons for the use of varchar. If there is no need to use the fields in criteria or perform calculations, using varchar would give the user a lot more freedom to do what they want.

For example, in real-estate, it seems like the price of a home should be numeric. However, many agents want to display phrases like "call for pricing", "in the low 300s", etc (though we keep a separate numeric price field for searching).

I would suggest looking at how the fields are used to determine if they should be varchar or not. If you see a lot of conversions from varchar to the type it should be, then varchar is probably not the right choice.


Some date fields are varchar(20)

This one of the things that will always get you in trouble in the future, now you can have invalid dates and then you can't do normal date arithmetic.

Some look-up ID foreign keys are varchar(100), even though the actual look-up table primary key is an int

This is bad because you will get conversions when you join and this will make it slower

Store decimals as decimals...sooner or later you will get bad data in there and then it will be a classic case of GIGO (Garbage In Garbage Out)

Also using nvarchar to store numbers is insane, you just doubled the storage needed to store those numbers, this will then store less rows per page and you will need more IO to bring back the same number of rows if had used regular varchars or integers


Some of these are clearly problems, especially the "dates as text" and "foreign keys that don't match the datatype of their related key".

The "ISBN 10 & 13 number fields as varchar(50)" isn't quite as clear-cut. Sure, it would work to store it as a BIGINT, but there are some good arguments for using a CHAR(10) or CHAR(13) instead: (even though it uses slightly more storage. Varchar(50) is clearly overkill)

  1. Will you ever need to do mathematical operations with this number? (no)
  2. Will you frequently "pretty-format" it? (00-0000-00-0 or something. Its easier to perform formatting operations on strings)
  3. Will you ever have to do a LIKE comparison? WHERE convert(varchar(13),ISBN) LIKE '%123%' is pretty ugly.

So depending on exactly how it will be used, I wouldn't have an issue using the CHAR instead. Actually, you could argue that VARCHAR(13) would make sense if a significant number of rows will have no ISBN (less storage).


Nope. I'd change it if it were mine. Do you know who made those decisions? If they're still around you can ask them.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜