开发者

Does having more attributes in a table reduce performance?

So far I am quite comfortable working with C# windows application. I am a开发者_如何学运维bout to shift to Asp.net to develop a website. The requirement has made me to put around 50 columns in a single table. I know this concept of breaking it into small tables using normal forms.

I tried googling, but dint get much results. I need to know if my table with 50 attributes would decrease performance of my web application? Can somebody suggest me about this.


Well, if you bring them all back you certainly have the network costs (data transfer between the db and your .NET code) and materialization costs (creating the object / DataTable representation in your DAL), so then you would definitely have some costs. Either way you have to consider the page size of the database.

But perhaps the key thing is: do you need all the data? If so, there is only so much you can do. Using multiple tables and introducing joins will also impact performance.

In most cases though, and especially ASP.NET, the most important figures are things like:

  • what volume data actually goes to the client application? (could you use paging / ajax /. etc?)
  • what data is stored in session?
  • how many round-trips do you do between client and app-server?

since bandwidth and latency between your app-server and the client tends to be the pinch-point. Measure things; make sure you are targetting the right performance concerns.

Also, ASP.NET has a reputation for not being shy about storing more than you expect in things like view-state; keep an eye on this, or switch to a more light-weight model like ASP.NET MVC.


If you are talking about a DB table with many fields (or columns), 50 is not really out of the ordinary.

However, you should keep the DB design normalized, and if the design is normalized with 50 fields, go with that.


One table, 50 columns?

One point of normalisaion is to avoid insert, delete, update anomalies

Now, it will run like a dog with more than a handful of rows, but data integrity trumps performance here...


Rather than thinking about how many columns , I suggest you to think about the data types of the columns..

De-Normalization is also popular. Choose your normalization depeding on your application logic. ( Think about JOINS carefully )


it all boils down to the kind of queries you would be using. at the end of the day, it is the amount of data you would be fetching from/writing to the table. if a majority of your queries fetch from/write to a majority of the columns, yes no of columns would have an impact.

the turn-around time of your query would be directly proportional to the amount of data it reads/writes. the greater the amount of data, the longer it would take. a large number of columns could mean a large amount of data(but not always).

having said that, 50 columns is not a big number. i have come across tables with more than 300 columns. but then, it also depends on the dbms you use.


That depends on the the what and how the data is retrieved. Certainly a SELECT * will tell on the performance. You will need to select the necessary columns as required and try using where clause. Thats one way to do from a table with lots of columns and data.


"Do the simplest thing that could possibly work." (Ward Cunningham).

If the columns all represent separate elements of data, you're following good normalization rules, and you don't have groups of repeating elements then the number of columns in the table really doesn't matter much. If you want to you can start sweating over the size of the rows vs. the size of a data block, how much space you may or may not be wasting, etc, ad nauseum, but in my experience it's better to keep your data together in a single table unless there's some overwhelming functional reason why it should be broken into multiple tables. I've unforunately had to work with databases where someone pre-supposed that too many fields in a single table was a Bad Thing, so they broke what was logically a single table into multiple tables with fewer fields. This made for a nightmare when trying to do updates.

Good luck.


Depending on what you actually mean, this might or might not be a problem.

If the 50 columns are relatively small in size and each contains a differnt type of data (phone, cit, state, firstname, etc), you are probably fine.

If they are things like telephone1, telephone2, etc, you need a related table as this is difficult to maintain and properly query. For instance, suppose you have fifty phone number fields now and the day comes when you need 51, then you have to change the table structure and all the related queries (you aren't using select * in production are you?) Suppose you want to know who has telephone number 111-111-1111, you have to join (or union) to the table 50 times to get the answer. This is where it can hurt performance.

The third case is where the 50 columns are each differnt things, but will all together be a large record due to the size of the fields. Understand that databases will let you create a strucutre that is larger than the maximum number of bytes a record can contain, it simply won't let you put more than that number of bytes in the record. Plus longer records tend to create issues in how the data is stored on the disk and may result in slower retrieval. In this case, it is best to create one or more related tables which will have a one-to-one relationship to the main table.


The thing that matters is not the number of columns on a table but the "width" of the table.

For example, if all 50 of those columns are bit columns then you are looking at 7 bytes per row which is tiny.

On the other hand if all 50 columns are VARCHAR(4000) columns then you are looking at a potential maximum row size of around 200 MB per row (yes SQL Server will let you do this), which could obviously cause problems (actually it probably won't, but my point is that the width of the data matters, not the number of columns).

The only sure-fire way of knowing whether or not you are going to have problems is to try and it and see, but as a very general rule its a good idea to try and keep the row size below 4KB (1 page), however this is a very general rule as:

  • Normally you probably want your row size to be much less than this so that you can fit many rows onto a page
  • However if you have a couple of large object fields (such as VARCHAR or VARCHAR(MAX)) on your table row sizes are probably going to exceed 4KB fairly regularly, this is perfectly OK.

Its a complicated subject - like I said the only sure-fire way of knowing is try it and see if it performs.

Note that with the exception of large objects (such as VARCHAR) SQL Server won't let you create a row larger than 1 page.


Why might a "wide" table cause problems?

Because it increases the amount of data that needs to be read.

As a very simple / contrived example suppose you have a table ordered by ID (i.e. has a clustered index on ID) and you want to retrieve records for IDs 100 to 110 inclusive. If the row size is small (say 200 bytes) then the size of all those records combined is around 2KB, which is much less than the page size (4KB). Because the table is ordered by ID its very likely that these records all fit onto 1 page, at most 2, and so a only a couple of reads are needed to read all 10 records.

Now suppose the row size is larger (say 2KB), then the total size of all those records combined is now 20KB. The minimum number of reads needed is now 5 at a minimum, possibly 6. On a busy database server these reads add up to extra I/O and extra memory pressure in the cahce.

Large objects

Depending on the amount of data stored, large object and variable length fields (such as VARCHAR) may store the data in separate pages, either LOB pages or row-overflow pages.

What does this mean? Well if you have a table with many such columns defined and you execute a SELECT * ... query then SQL Server needs to retrieve all these extra pages to read all this extra data. We end up with the same situation as above - lots of reads which is bad.

However if instead we only specify some of the columns in our query, e.g. SELECT ID, Address ... then SQL Server doesn't need to bother reading pages that contain data we aren't interested in. Despite the fact that this table may define many columns with a huge row width, because we specified the columns we are interested in and because this data is stored in separate pages the number of reads needed is still relatively low.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜