Tsql - performing a join on a delimited column - performance and optimisation issue

2023-03-05 06:28 问答作者：

I have the following (slightly simplified in the columns returned) query.

select Products.Product, Products.ID, Products.Customers
from Products
where Products.orderCompleteDate is null

This would return, as an example

productA  1  Bob
productA  1  Jane
productB  2  John,Dave

Note that Customers can be a comma delimited list. What I want to add, is a column 'Customer Locations', so the above becomes

productA  1  Bob        Ireland
productA  1  Jane       Wales
produc开发者_高级运维tB  2  John,Dave  Scotland,England

I created a function below, where fn_split returns a single row per delimited item.

create FUNCTION [dbo].[GetLocations]  (@CustomerNames Varchar(256) )   

RETURNS @TempLocations table (CustomerLocations varchar(256)) AS begin
declare @NameStr varchar(256)  
declare @temp table(singleLoc varchar(256))

insert into @temp
select CustomerLocation.Location from CustomerLocation
INNER JOIN Customers ON Customers.ID = CustomerLocation.ID
INNER JOIN dbo.fn_Split(@CustomerNames,',') split ON split.Item = Customers.Name

SELECT @NameStr = COALESCE(@NameStr + ',', '') + singleLoc 
FROM @temp 

insert into @TempLocations values (@NameStr)
return
end

And applied it to the original query as follows

select Products.product, Products.ID, Products.Customers, Locations.CustomerLocations
from Products
OUTER APPLY dbo.GetLocations(Products.Customers,',') AS Locations
where Products.orderCompleteDate is null

However, this is extremely slow, with the query taking ~10seconds on a table with a mere 2000 rows (initial query runs almost instantly). This suggests that the query was unable to be optimised, and is being generated row by row. I stayed away from scalar value functions for this reason, and tried to stick to table value functions. Is there any glaring fault in my logic/code?

I'd normally suggest creating a view, based on the unnormalized table, that does the normalization, and then use that as the basis for any future queries. Unfortunately, I can't identify a PK for your current Products table, but you'd hopefully create this view using schemabinding, and hopefully be able to turn it into an indexed view (indexing on PK + customer name).

Querying this view (using Enterprise Edition, or the NOEXPAND option) should then give you comparable performance as if the normalized table existed.

One option would be to create a second table that normalises the product table and keeps it in sync with triggers that call the split function when inserting rows.

Pros are you get standard performance and easy SQL queries

Cons are potential for tables going out of sync should anything go wrong (can always schedule a job to rebuild new table from scratch periodically)

Obviously best answer would be to redesign product table but assume that's not possible for you to be messing with split functions etc.

继续阅读：join performance sql tsql

Tsql - performing a join on a delimited column - performance and optimisation issue

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？