Table index design
I would like to add index(s) to my table开发者_运维问答. I am looking for general ideas how to add more indexes to a table. Other than the PK clustered. I would like to know what to look for when I am doing this. So, my example:
This table (let's call it TASK table) is going to be the biggest table of the whole application. Expecting millions records.
IMPORTANT: massive bulk-insert is adding data in this table
table has 27 columns: (so far, and counting :D )
int x 9 columns = id-s
varchar x 10 columns
bit x 2 columns
datetime x 5 columns
INT COLUMNS
all of these are INT ID-s but from tables that are usually smaller than Task table (10-50 records max), example: Status table (with values like "open", "closed") or Priority table (with values like "important", "not so important", "normal") there is also a column like "parent-ID" (self - ID)
join: all the "small" tables have PK, the usual way ... clustered
STRING COLUMNS
there is a (Company) column (string!) that is something like "5 characters long all the time" and every user will be restricted using this one. If in Task there are 15 different "Companies" the logged in user would only see one. So there's always a filter on this one. Might be a good idea to add an index to this column?
DATE COLUMNS
I think they don't index these ... right? Or can / should be?
I wouldn't add any indices - unless you have specific reasons to do so, e.g. performance issues.
In order to figure out what kind of indices to add, you need to know:
what kind of queries are being used against your table - what are the
WHERE
clauses, what kind ofORDER BY
are you doing?how is your data distributed? Which columns are selective enough (< 2% of the data) to be useful for indexing
what kind of (negative) impact do additional indices have on your INSERTs and UPDATEs on the table
any foreign key columns should be part of an index - preferably as the first column of the index - to speed up JOINs to other tables
And sure you can index a DATETIME
column - what made you think you cannot?? If you have a lot of queries that will restrict their result set by means of a date range, it can make total sense to index a DATETIME
column - maybe not by itself, but in a compound index together with other elements of your table.
What you cannot index are columns that hold more than 900 bytes of data - anything like VARCHAR(1000)
or such.
For great in-depth and very knowledgeable background on indexing, consult the blog by Kimberly Tripp, Queen of Indexing.
in general an index will speed up a JOIN, a sort operation and a filter
SO if the columns are in the JOIN, the ORDER BY or the WHERE clause then an index will help in terms of performance...but there is always a but...with every index that you add UPDATE, DELETE and INSERT operations will be slowed down because the indexes have to be maintained
so the answer is...it depends
I would say start hitting the table with queries and look at the execution plans for scans, try to make those seeks by either writing SARGable queries or adding indexes if needed...don't just add indexes for the sake of adding indexes
Step one is to understand how the data in the table will be used: how will it be inserted, selected, updated, deleted. Without knowing your usage patterns, you're shooting in the dark. (Note also that whatever you come up with now, you may be wrong. Be sure to compare your decisions with actual usage patterns once you're up and running.) Some ideas:
If users will often be looking up individual items in the table, an index on the primary key is critical.
If data will be inserted with great frequency and you have multiple indexes, over time you well have to deal with index fragmentation. Read up on and understand clustered and non-clustered indexes and fragmentation (ALTER INDEX...REBUILD).
But, if performance is key in situations when you need to retrieve a lot of rows, you might consider using your clustered indexe to support that.
If you often want a set of data based on Status, indexing on that column can be good--particularly if 1% of your rows are "Active" vs. 99% "Not Active", and all you want are the active ones.
Conversely, if your "PriorityId" is only used to get the "label" stating what PriorityId 42 is (i.e. join into the lookup table), you probably don't need an index on it in your main table.
A last idea, if everyone will always retrieve data for only one Company at a time, then (a) you'll definitely want to index on that, and (b) you might want to consider partitioning the table on that value, as it can act as a "built in filter" above and beyond conventional indexing. (This is perhaps a bit extreme and it's only available in Enterprise edition, but it may be worth it in your case.)
精彩评论