Maximizing performance of SQL indexes for VARCHAR columns with homogeneous prefixes
I'm designing a DB2 table, one VARCHAR column of which will store an alpha-numeric product identifier. The first few characters of these IDs have very little variation. The column will be indexed, and I'm concerned that performance may suffer because of the common prefixes.
As far as I can tell, DB2 does not use hash codes for selecting VARCHARs. (At least basic DB2, I don't know about any extensions.)
If this is to be a problem, I can think of three obvious solutions.
- Create an extra, hash code column.
- Store the text backward, to ensure good distribution of initial characters.
- Break the product IDs into two columns, one containing a long enough prefix to produce better distribution in the remainder.
Each of these would be a hack, of course.
Solution #2 would provide the best key distribution. The backwards text could be stored in a separate column, or I could reverse the string after reading. Each approach involves overhead, which I w开发者_Go百科ould want to profile and compare.
With solution #3, the key distribution still would be non-optimal, and I'd need to concatenate the text after reading, or use 3 columns for the data.
If I leave my product IDs as-is, is my index likely to perform poorly? If so, what is the best method to optimize the performance?
I'm a SQL dba, not db2, but I wouldn't think that having common prefixes would hurt you at all, indexing wise.
The index pages simply store a "from" and "to" range of key values with pointers to the actual pages. The fact that an index page happens to store FrobBar001291
to FrobBar009281
shouldn't matter in the slightest to the db engine.
In fact, having these common prefixes allows the index to take advantage of other queries like:
SELECT * FROM Products WHERE ProductID LIKE 'FrobBar%'
I agree with BradC that I don't think this is a problem at all, and even if there was some small benefit to the alternatives you suggest, I imagine all the overhead and complexity would outweigh any benefits.
If you're looking to understand and improve index performance, there are a number of topics in the Info Center that you should consider (in particular the last two topics seem relevant): http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/nav/2_3_2_4_1 like:
- Index structure
- Index cleanup and maintenance
- Asynchronous index cleanup
- Asynchronous index cleanup for MDC tables
- Online index defragmentation
- Using relational indexes to improve performance
- Relational index planning tips
- Relational index performance tips
精彩评论