开发者

SQL Server 2005, wide indexes, computed columns, and sargable queries

In my database, assume we have a table defined as follows:

CREATE TABLE [Chemical](
    [ChemicalId] int NOT NULL IDENTITY(1,1) PRIMARY KEY,
    [Name] nvarchar(max) NOT NULL,
    [Description] nvarchar(max) NULL
)

The value for Name can be very large, so we must use nvarchar(max). Unfortunately, we want to create an index on this column, but nvarchar(max) is not supported inside an index.

So we create the following computed column and associated index based upon it:

ALTER TABLE [Chemical]
ADD [Name_Indexable] AS LEFT([Name], 20)

CREATE INDEX [IX_Name] 
ON [Chemical]([Name_Indexable]) 
INCLUDE([Name])

The index will not be unique but we can enforce uniqueness via a trigger.

If we perform the following query, the execution plan results in a index scan, which is not what we want:

SELECT [ChemicalId], [Name], [Description] 
FROM [Chemical] 
WHERE [Name]='[1,1''-Bicyclohexyl]-2-carboxylic acid, 4'',5-dihydroxy-2'',3-dimethyl-5'',6-bis[(1-oxo-2-propen-1-yl)oxy]-, methyl ester'

However, if we modify the query to make it "sargable," then the execution plan results in an index seek, which is what we want:

SELECT [ChemicalId], [Name], [Description] 
FROM [Chemical] 
WHERE [Indexable_Name]='[1,1''-Bicyclohexyl]-' AND [Name]='[1,1''-Bicyclohexyl]-2-carboxylic acid, 4'',5开发者_开发知识库-dihydroxy-2'',3-dimethyl-5'',6-bis[(1-oxo-2-propen-1-yl)oxy]-, methyl ester'

Is this a good solution if we control the format of all queries executed against the database via our middle tier? Is there a better way? Is this a major kludge? Should we be using full-text indexing?


Your index is on name_indexable, not on name. Since name_indexable is generated from a function involving name instead of directly on the column name, the optimizer won't automatically use the index when your where clause includes a reference to name. You must search on name_indexable in order to use the index. Since you have a middle tier, your best bet is probably to provide a function that searches on name_indexable if the given name is <= 200 characters, and otherwise searches on both.


Make the Name_Index column a persisted computed column and the primary key and enforce uniqueness by appending the ChemicalId instead of relying on triggers.

CREATE TABLE dbo.[Chemical]
    ([ChemicalId] int NOT NULL IDENTITY(1,1), 
    [Name] Nvarchar(max) NOT NULL, 
    [Description] Nvarchar(max) NOT NULL,
    [Name_Index] AS (CONVERT(VARCHAR(20), LEFT([Name], 20)) + CONVERT(VARCHAR(20), [ChemicalId])) PERSISTED PRIMARY KEY);


IMHO, yes, I think this is a bad approach. If you knew that the first 20 characters will be unique, then it should be a first class column with a unique constraint. If you want to have better searching on the Name column, then using full-text search is the right way to go. If you want to ensure that the varchar(max) column is unique, then create a computed column that generates a hash off the value and place a unique constraint off that.

Alter Table Add NameHash Hashbytes('SHA1', [Name])

ADDITION

Given our discussion, if your searches are always going to be on a exact match, then you could hash your search parameter and compare it to NameHash above. However, the catch is that the match must be an exact match (i.e. case-sensitive).

I still content that the FTS will be your best bet. Even though there is overhead in breaking up your text into words, FTS is the tool best designed for doing searches against large amounts of text. The longer your search criteria, the more exact it will be, the faster the search.


Did you try

WHERE [Name_Indexable]='1,2,3-Propanetriol'

After all that is where your index is created on


I find your solution from the question (the last query) very good, but I personally prefer to say SQL more exactly what and how I want to do. So if you works with Microsoft SQL Server or with some other SQL Server which supports CTE (common table expression) you can rewrite your query like following:

DECLARE @data nvarchar(max);

SET @data = '[1,1''-Bicyclohexyl]-2-carboxylic acid, 4'',5-dihydroxy-2'',3-dimethyl-5'',6-bis[(1-oxo-2-propen-1-yl)oxy]-, methyl ester';

WITH ReduceData ([ChemicalId], [Name], [Description]) AS (
    SELECT [ChemicalId], [Name], [Description] 
    FROM [dbo].[Chemical]
    WHERE [Name_Indexable]=LEFT(@data,20)
)
SELECT [ChemicalId], [Name], [Description] 
FROM ReduceData
WHERE [Name]=@data

(In the real implementation you probably don't need to define @data. Instead you can just use a parameterized query.). What I suggest is just to say SQL more explicit what you want. All CTE Queries can be very good optimized.

It can be that your original query will be compiled to the absolutely the same execution plan as my CTE version. You can look at the both plans and compare there. In your project you have probably much more complex queries as from your question. If you'll use more CTE, your SQL code will be easy to read, it can be very good optimized and you can be sure that SQL Server do exactly what you want.

UPDATED: By the way the line

ALTER TABLE [Chemical]
    ADD [Name_Indexable] AS LEFT([Name], 20)

should be changed to

ALTER TABLE [Chemical]
    ADD [Name_Indexable] AS CAST(LEFT([Name], 20) AS varchar(20)) PERSISTED

to make a [Name_Indexable] column of the type varchar(20) on the Microsoft SQL Server 2008 and mark it PERSISTED to stores the computed values in the table, and updates them when any other columns on which the computed column depends are updated


Fix your data model. You have a comma-delimited list in the name column, to me that means you would better be able to query if you had a related table. Your name appears to be a list of ingredients not a name.

If this really truly is a real name, then Registered User has a good plan.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜