开发者

What kind of overhead do non clustered indexes add?

If you are talking about btrees, I wouldn't imagine that the additional overhead of a non clustered index (not counting stuff like full text search or other kind of string indexing) is even measurable, except for an extremely high volume high write scenario.

What kind of overhead are we actually talking about? Why would it be a bad idea to just index everything? Is this implementation specific? (in that case, I am mostly interested in answers around pg)

EDIT: To explain the reasoning behind this a bit more...

We are looking to specifically improve performance right now across the board, and one of the key things we are looking at is query performance. I have read the things mentioned here, that indexes will increase db size on disk and will slow down writes. The question came up today when one pair did some pre-emptive indexing on a new table, since we usually apply indexes in a more reactive way. Their arguement was that they weren't indexing string fields, and they weren't doing clustered indexes, so the negative impact of possibly redundant indexes should barely be measurable.

Now, I am far from an expert in such things, and those arguments made a lot of sense to me based on what I understand.

Now, I am sure there are other reasons, or I am misunderstanding something. I know a redundant ind开发者_开发百科ex will have a negative effect, what I want to know is how bad it will be (because it seems negligible). The whole indexing every field thing is a worst case scenario, but I figured if people could tell me what that will do to my db, it will help me understand the concerns around being conservative with indexing, or just throwing them out there when it has a possibility of helping things.


Random thoughts

  • Indexes benefit reads of course
  • You should index where you get the most bang for your buck
  • Most DBs are > 95% read (think about updates, FK checks, duplicate checks etc = reads)
  • "Everything" is pointless: most indexed need to be composite with includes
  • Define high volume we have 15-20 million new rows per day with indexes


Introduction to Indices

In short, an index, whether clustered or non-, adds extra "branches" to the "tree" in which data is stored by most current DBMSes. This makes finding values with a single unique combination of the index logarithmic-time instead of linear-time. This reduction in access time speeds up many common tasks the DB does; however, when performing tasks other than that, it can slow it down because the data must be accessed through the tree. Filtering based on non-indexed columns, for instance, requires the engine to iterate through the tree, and because the ratio of branch nodes (containing only pointers to somewhere else in the tree) to leaf nodes has been reduced, this will take longer than if the index were not present.

In addition, non-clustered indices separate data based on column values, but if those column values are not very unique across all table rows (like a flag indicating "yes" or "no"), then the index adds an extra level of complexity that doesn't actually help the search; in fact, it hinders it because in navigating from root to leaves of the tree, an extra branch is encountered.


I am sure the exact overheard is probably implementation specific, but off the top of my head some points:

  • Increased Disk Space requirements.
  • All writes (inserts, updates, deletes) cost more as all indexes must be updated.
  • Increased transaction locking overheard (all indexes must be updated within a transaction, leading to more locks being required, etc).
  • Potentially increased complexity for the query optimizer (choosing which index is most likely to perform best; Also potential for one index to be chosen when another index would actually be better).
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜