Tsql - performing a join on a delimited column - performance and optimisation issue
I have the following (slightly simplified in the columns returned) query.
select Products.Product, Products.ID, Products.Customers
from Products
where Products.orderCompleteDate is null
This would return, as an example
productA 1 Bob
productA 1 Jane
productB 2 John,Dave
Note that Customers can be a comma delimited list. What I want to add, is a column 'Customer Locations', so the above becomes
productA 1 Bob Ireland
productA 1 Jane Wales
produc开发者_高级运维tB 2 John,Dave Scotland,England
I created a function below, where fn_split returns a single row per delimited item.
create FUNCTION [dbo].[GetLocations] (@CustomerNames Varchar(256) )
RETURNS @TempLocations table (CustomerLocations varchar(256)) AS begin
declare @NameStr varchar(256)
declare @temp table(singleLoc varchar(256))
insert into @temp
select CustomerLocation.Location from CustomerLocation
INNER JOIN Customers ON Customers.ID = CustomerLocation.ID
INNER JOIN dbo.fn_Split(@CustomerNames,',') split ON split.Item = Customers.Name
SELECT @NameStr = COALESCE(@NameStr + ',', '') + singleLoc
FROM @temp
insert into @TempLocations values (@NameStr)
return
end
And applied it to the original query as follows
select Products.product, Products.ID, Products.Customers, Locations.CustomerLocations
from Products
OUTER APPLY dbo.GetLocations(Products.Customers,',') AS Locations
where Products.orderCompleteDate is null
However, this is extremely slow, with the query taking ~10seconds on a table with a mere 2000 rows (initial query runs almost instantly). This suggests that the query was unable to be optimised, and is being generated row by row. I stayed away from scalar value functions for this reason, and tried to stick to table value functions. Is there any glaring fault in my logic/code?
I'd normally suggest creating a view, based on the unnormalized table, that does the normalization, and then use that as the basis for any future queries. Unfortunately, I can't identify a PK for your current Products table, but you'd hopefully create this view using schemabinding
, and hopefully be able to turn it into an indexed view (indexing on PK + customer name).
Querying this view (using Enterprise Edition, or the NOEXPAND option) should then give you comparable performance as if the normalized table existed.
One option would be to create a second table that normalises the product table and keeps it in sync with triggers that call the split function when inserting rows.
Pros are you get standard performance and easy SQL queries
Cons are potential for tables going out of sync should anything go wrong (can always schedule a job to rebuild new table from scratch periodically)
Obviously best answer would be to redesign product table but assume that's not possible for you to be messing with split functions etc.
精彩评论