开发者

Table index design

I would like to add index(s) to my table开发者_运维问答. I am looking for general ideas how to add more indexes to a table. Other than the PK clustered. I would like to know what to look for when I am doing this. So, my example:

This table (let's call it TASK table) is going to be the biggest table of the whole application. Expecting millions records.

IMPORTANT: massive bulk-insert is adding data in this table

table has 27 columns: (so far, and counting :D )

int x 9 columns = id-s

varchar x 10 columns

bit x 2 columns

datetime x 5 columns

INT COLUMNS

all of these are INT ID-s but from tables that are usually smaller than Task table (10-50 records max), example: Status table (with values like "open", "closed") or Priority table (with values like "important", "not so important", "normal") there is also a column like "parent-ID" (self - ID)

join: all the "small" tables have PK, the usual way ... clustered

STRING COLUMNS

there is a (Company) column (string!) that is something like "5 characters long all the time" and every user will be restricted using this one. If in Task there are 15 different "Companies" the logged in user would only see one. So there's always a filter on this one. Might be a good idea to add an index to this column?

DATE COLUMNS

I think they don't index these ... right? Or can / should be?


I wouldn't add any indices - unless you have specific reasons to do so, e.g. performance issues.

In order to figure out what kind of indices to add, you need to know:

  • what kind of queries are being used against your table - what are the WHERE clauses, what kind of ORDER BY are you doing?

  • how is your data distributed? Which columns are selective enough (< 2% of the data) to be useful for indexing

  • what kind of (negative) impact do additional indices have on your INSERTs and UPDATEs on the table

  • any foreign key columns should be part of an index - preferably as the first column of the index - to speed up JOINs to other tables

And sure you can index a DATETIME column - what made you think you cannot?? If you have a lot of queries that will restrict their result set by means of a date range, it can make total sense to index a DATETIME column - maybe not by itself, but in a compound index together with other elements of your table.

What you cannot index are columns that hold more than 900 bytes of data - anything like VARCHAR(1000) or such.

For great in-depth and very knowledgeable background on indexing, consult the blog by Kimberly Tripp, Queen of Indexing.


in general an index will speed up a JOIN, a sort operation and a filter

SO if the columns are in the JOIN, the ORDER BY or the WHERE clause then an index will help in terms of performance...but there is always a but...with every index that you add UPDATE, DELETE and INSERT operations will be slowed down because the indexes have to be maintained

so the answer is...it depends

I would say start hitting the table with queries and look at the execution plans for scans, try to make those seeks by either writing SARGable queries or adding indexes if needed...don't just add indexes for the sake of adding indexes


Step one is to understand how the data in the table will be used: how will it be inserted, selected, updated, deleted. Without knowing your usage patterns, you're shooting in the dark. (Note also that whatever you come up with now, you may be wrong. Be sure to compare your decisions with actual usage patterns once you're up and running.) Some ideas:

If users will often be looking up individual items in the table, an index on the primary key is critical.

If data will be inserted with great frequency and you have multiple indexes, over time you well have to deal with index fragmentation. Read up on and understand clustered and non-clustered indexes and fragmentation (ALTER INDEX...REBUILD).

But, if performance is key in situations when you need to retrieve a lot of rows, you might consider using your clustered indexe to support that.

If you often want a set of data based on Status, indexing on that column can be good--particularly if 1% of your rows are "Active" vs. 99% "Not Active", and all you want are the active ones.

Conversely, if your "PriorityId" is only used to get the "label" stating what PriorityId 42 is (i.e. join into the lookup table), you probably don't need an index on it in your main table.

A last idea, if everyone will always retrieve data for only one Company at a time, then (a) you'll definitely want to index on that, and (b) you might want to consider partitioning the table on that value, as it can act as a "built in filter" above and beyond conventional indexing. (This is perhaps a bit extreme and it's only available in Enterprise edition, but it may be worth it in your case.)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜