开发者

varchar(max) everywhere?

Is there any problem with making all your Sql Server 2008 string columns varchar(max)? My allowable string sizes are managed by the application. The database should just persist what I give it. Will I take a performance hit by declaring all string columns to be of type varchar(max) in Sql Server 2008, no matter what the size of the data that actually goe开发者_开发问答s into them?


By using VARCHAR(MAX) you are basically telling SQL Server "store the values in this field how you see best", SQL Server will then choose whether to store values as a regular VARCHAR or as a LOB (Large object). In general if the values stored are less than 8,000 bytes SQL Server will treat values as a regular VARCHAR type.

If the values stored are too large then the column is allowed to spill off the page in to LOB pages, exactly as they do for other LOB types (text, ntext and image) - if this happens then additional page reads are required to read the data stored in the additional pages (i.e. there is a performance penatly), however this only happens if the values stored are too large.

In fact under SQL Server 2008 or later data can overflow onto additional pages even with the fixed length data types (e.g. VARCHAR(3,000)), however these pages are called row overflow data pages and are treated slightly differently.

Short version: from a storage perspective there is no disadvantage of using VARCHAR(MAX) over VARCHAR(N) for some N.

(Note that this also applies to the other variable-length field types NVARCHAR and VARBINARY)

FYI - You can't create indexes on VARCHAR(MAX) columns


Indexes can not be over 900 bytes wide for one. So you can probably never create an index. If your data is less then 900 bytes, use varchar(900).

This is one downside: because it gives

  • really bad searching performance
  • no unique constraints


Simon Sabin wrote a post on this some time back. I don't have the time to grab it now, but you should search for it, because he comes up with the conclusion that you shouldn't use varchar(max) by default.

Edited: Simon's got a few posts about varchar(max). The links in the comments below show this quite nicely. I think the most significant one is http://sqlblogcasts.com/blogs/simons/archive/2009/07/11/String-concatenation-with-max-types-stops-plan-caching.aspx, which talks about the effect of varchar(max) on plan caching. The general principle is to be careful. If you don't need it to be max, then don't use max - if you need more than 8000 characters, then sure... go for it.


For this question specifically a few points I don't see mentioned.

  1. On 2005/2008/2008 R2 if a LOB column is included in an index this will block online index rebuilds.
  2. On 2012 the online index rebuild restriction is lifted but LOB columns cannot participate in the new functionality Adding NOT NULL Columns as an Online Operation.
  3. Locks can be taken out longer on rows containing columns of this data type. (more)

A couple of other reasons are covered in my answer as to why not varchar(8000) everywhere.

  1. Your queries may end up requesting huge memory grants not justified by the size of data.
  2. On table with triggers it can prevent an optimisation where versioning tags are not added.


I asked the similar question earlier. got some interesting replies. check it out here There was one site that had a guy talking about the detriment of using wide columns, however if your data is limited in the application, my testing disproved it. The fact you can't create indexes on the columns means I wouldn't use them all the time (personally i wouldn't use them that much at all, but i'm a bit of a purist in that regard). However if you know there isn't much stored in them, i don't think they are that bad. If you do any sorting on columns a recordset with a varchar(max) in it (or any wide column being char or varchar), then you could suffer performance penalties. these could be resolved (if required) by indexes, but you can't put indexes on varchar(max). If you want to future proof your columns, why not just put them to something reasonable. eg a name column be 255 characters instead of max... that kinda thing.


There is another reason to avoid using varchar(max) on all columns. For the same reason we use check constraints (to avoid filling tables with junk caused by errant software or user entries), we would want to guard against any faulty process that adds much more data than intended. For example, if someone or something tried to add 3,000 bytes into a City field, we would know for certain that something is amiss and would want to stop the process dead in its tracks to debug it at the earliest possible point. We would also know that a 3000-byte city name could not possibly be valid and would mess up reports and such if we tried to use it.


Ideally, you should only allow what you need. Meaning if you're certain a particular column (say a username column) is never going to be more than 20 characters long, using a VARCHAR(20) vs. a VARCHAR(MAX) lets the database optimize queries and data structures.

From MSDN: http://msdn.microsoft.com/en-us/library/ms176089.aspx

Variable-length, non-Unicode character data. n can be a value from 1 through 8,000. max indicates that the maximum storage size is 2^31-1 bytes.

Are you really going ever going to come close to 2^31-1 bytes for these columns?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜