Bizarre performance issue: Common Table Expressions in inline User-Defined Function

2022-12-16 14:01 问答作者：

Here's a brain-twister for the SQL guys - can anyone think of a reason why the first of these functions performs fine, and the second one runs dog-slow?

Function A - Typically finishes in ~5 ms

CREATE FUNCTION dbo.GoodFunction
(
    @IDs UniqueIntTable READONLY
)
RETURNS TABLE
AS RETURN
    SELECT p.ID, p.Node, p.Name, p.Level
    FROM
    (
        SELECT DISTINCT a.Ancestor AS Node
        FROM Hierarchy h
        CROSS APPLY dbo.GetAncestors(h.Node.GetAncestor(1)) a
        WHERE h.ID IN (SELECT Value FROM @IDs)
    ) np
    INNER JOIN Hierarchy p
    ON p.Node = np.Node

Function B - Runs extremely slow - I gave up after 5 minutes

CREATE FUNCTION dbo.BadFunction
(
    @IDs UniqueIntTable READONLY
)
RETURNS TABLE
AS RETURN
    WITH Ancestors_CTE AS
    (
        SELECT DISTINCT a.Ancestor AS Node
        FROM Hierarchy c
        CROSS APPLY dbo.GetAncestors(c.Node.GetAncestor(1)) a
        WHERE c.ID IN (SELECT Value FROM @IDs)
    )
    SELECT p.ID, p.Node, p.Name, p.Level
    FROM Ancestors_CTE ac
    INNER JOIN Hierarchy p
    ON p.Node = ac.Node

I'll explain below what this function does, but before I get into that, I want to point out that I don't think it's important, because as far as I can tell, these two functions are exactly the same! The only difference is that one uses a CTE and one uses a subquery; the contents of the subquery in A and the CTE in B are identical.

In case anyone decides this matters: The purpose of this function is just to pick out all the possible ancestors (parent, grandparent, etc.) of an arbitrary number of locations in a hierarchy. The Node column is a hierarchyid, and dbo.GetAncestors is a CLR function that simply walks up the path, it does not do any data access.

UniqueIntTable is what it implies - it's a user-defined table type with one column, Value int NOT NULL PRIMARY KEY. Everything here that should be indexed is indexed - the execution plan of function A is essentially just two index seeks and a hash match, as it should be with function B.

Some even stranger aspects to this strange problem:

I'm not even able to get an estimated execution plan for a simple query using function B. It almost looks like the performance issue has something to do with the compilation of this simple-looking function.
If I take the "body" out of function B and just stick it into an inline query, it runs normally, same performance as function A. So it only seems to be a problem with a CTE inside a UDF, or conversely, only with a UDF that uses a CTE.
The CPU usage on one core on the test machine spikes all the way up to 100% when I try to run B. There d开发者_运维技巧oesn't seem to be much I/O.

I want to just shrug it off as a SQL Server bug and use version A, but I always try to keep Rule #1 ("SELECT Ain't Broken") in mind, and I'm concerned that the good results from function A are somehow a localized fluke, that it will "fail" the same way that B does on a different server.

Any ideas?

UPDATE - I'm now including a complete self-contained script to reproduce.

GetAncestors Function

[SqlFunction(FillRowMethodName = "FillAncestor", 
    TableDefinition = "Ancestor hierarchyid", IsDeterministic = true,
    IsPrecise = true, DataAccess = DataAccessKind.None)]
public static IEnumerable GetAncestors(SqlHierarchyId h)
{
    while (!h.IsNull)
    {
        yield return h;
        h = h.GetAncestor(1);
    }
}

Schema Creation

BEGIN TRAN

CREATE TABLE Hierarchy
(
    ID int NOT NULL IDENTITY(1, 1)
        CONSTRAINT PK_Hierarchy PRIMARY KEY CLUSTERED,
    Node hierarchyid NOT NULL,
    [Level] as Node.GetLevel(),
    Name varchar(50) NOT NULL
)

CREATE INDEX IX_Hierarchy_Node
ON Hierarchy (Node)
INCLUDE (Name)

CREATE INDEX IX_Hierarchy_NodeBF
ON Hierarchy ([Level], Node)

GO

INSERT Hierarchy (Node, Name)
    SELECT CAST('/1/' AS hierarchyid), 'Alice' UNION ALL
    SELECT CAST('/1/1/' AS hierarchyid), 'Bob' UNION ALL
    SELECT CAST('/1/1/1/' AS hierarchyid), 'Charles' UNION ALL
    SELECT CAST('/1/1/2/' AS hierarchyid), 'Dave' UNION ALL
    SELECT CAST('/1/1/3/' AS hierarchyid), 'Ellen' UNION ALL
    SELECT CAST('/1/2/' AS hierarchyid), 'Fred' UNION ALL
    SELECT CAST('/1/3/' AS hierarchyid), 'Graham' UNION ALL
    SELECT CAST('/1/3/1/' AS hierarchyid), 'Harold' UNION ALL
    SELECT CAST('/1/3/2/' AS hierarchyid), 'Isabelle' UNION ALL
    SELECT CAST('/1/4/' AS hierarchyid), 'John' UNION ALL
    SELECT CAST('/2/' AS hierarchyid), 'Karen' UNION ALL
    SELECT CAST('/2/1/' AS hierarchyid), 'Liam' UNION ALL
    SELECT CAST('/2/2/' AS hierarchyid), 'Mary' UNION ALL
    SELECT CAST('/2/2/1/' AS hierarchyid), 'Nigel' UNION ALL
    SELECT CAST('/2/2/2/' AS hierarchyid), 'Oliver' UNION ALL
    SELECT CAST('/2/3/' AS hierarchyid), 'Peter' UNION ALL
    SELECT CAST('/2/3/1/' AS hierarchyid), 'Quinn'

GO

CREATE TYPE UniqueIntTable AS TABLE 
(
    Value int NOT NULL,
    PRIMARY KEY (Value)
)

GO

COMMIT

GO

The above code/script can be used to create the CLR function/DB schema; use the same GoodFunction and BadFunction scripts in the original.

Haha, try this:

IF OBJECT_ID('_HappyFunction' ) IS NOT NULL DROP FUNCTION _HappyFunction
IF OBJECT_ID('_SadFunction'   ) IS NOT NULL DROP FUNCTION _SadFunction
IF TYPE_ID  ('_UniqueIntTable') IS NOT NULL DROP TYPE _UniqueIntTable
GO

CREATE TYPE _UniqueIntTable AS TABLE (Value int NOT NULL PRIMARY KEY)
GO

CREATE FUNCTION _HappyFunction (@IDs _UniqueIntTable READONLY)
RETURNS TABLE AS RETURN
  SELECT Value FROM @IDs
GO

CREATE FUNCTION _SadFunction (@IDs _UniqueIntTable READONLY)
RETURNS TABLE AS RETURN 
  WITH CTE AS (SELECT Value FROM @IDs)
  SELECT Value FROM CTE
GO

-- this will return an empty record set
DECLARE @IDs _UniqueIntTable 
SELECT * FROM _HappyFunction(@IDs)
GO

-- this will hang
DECLARE @IDs _UniqueIntTable 
SELECT * FROM _SadFunction(@IDs)
GO

Who would have guessed?

I've reproduced the behavior on SQL 2008 SP1, substituting a SQL UDF for the CLF UDF dbo.GetAncestors. I tried both a table valued function and an in-line function; neither one made a difference.

I don't know what is going on yet, but the benefit of others, I'll include my definitions below.

-- try a recursive inline UDF...
CREATE FUNCTION dbo.GetAncestors(@hierarchyid hierarchyid)
RETURNS TABLE AS RETURN (
WITH recurse AS (
    SELECT @hierarchyid AS Ancestor
    WHERE @hierarchyid IS NOT NULL
    UNION ALL
    SELECT Ancestor.GetAncestor(1) FROM recurse
    WHERE Ancestor.GetAncestor(1) IS NOT NULL
    )
SELECT * FROM recurse
)

-- ...or a table-valued UDF, it makes no difference
CREATE FUNCTION dbo.GetAncestors(@hierarchyid hierarchyid)
RETURNS @return TABLE (Ancestor hierarchyid) 
AS BEGIN
    WHILE @hierarchyid IS NOT NULL BEGIN
        INSERT @return (Ancestor)
        VALUES (@hierarchyid)
        SET @hierarchyid = @hierarchyid.GetAncestor(1)
    END             
    RETURN
END

Choose one of the definitions above, and then run this to watch it hang:

DECLARE @IDs UniqueIntTable 
INSERT @IDs SELECT ID FROM Hierarchy
RAISERROR('we have inserted %i rows.',-1,-1,@@ROWCOUNT) WITH NOWAIT
SELECT * FROM dbo.GoodFunction(@IDs) a
RAISERROR('we have returned %i rows.',-1,-1,@@ROWCOUNT) WITH NOWAIT
GO

DECLARE @IDs UniqueIntTable 
INSERT @IDs SELECT ID FROM Hierarchy
RAISERROR('we have inserted %i rows.',-1,-1,@@ROWCOUNT) WITH NOWAIT
SELECT * FROM dbo.BadFunction(@IDs) a
RAISERROR('we have returned %i rows.',-1,-1,@@ROWCOUNT) WITH NOWAIT
GO

The second batch never even starts. It gets past the parse stage but appears to get lost somewhere between bind and optimize.

The bodies of both functions compile to exactly the same execution plan, outside the function wrapper:

SET SHOWPLAN_TEXT ON
GO
DECLARE @IDs UniqueIntTable 
INSERT @IDs SELECT ID FROM Hierarchy
SELECT p.ID, p.Node, p.Name, p.[Level]
FROM
(
    SELECT DISTINCT a.Ancestor AS Node
    FROM Hierarchy c 
    CROSS APPLY dbo.GetAncestors_IF(c.Node.GetAncestor(1)) a
    WHERE c.ID IN (SELECT Value FROM @IDs)
) np
INNER JOIN Hierarchy p
ON p.Node = np.Node

;WITH Ancestors_CTE AS
(
    SELECT DISTINCT a.Ancestor AS Node
    FROM Hierarchy c
    CROSS APPLY dbo.GetAncestors_IF(c.Node.GetAncestor(1)) a
    WHERE c.ID IN (SELECT Value FROM @IDs)
)
SELECT p.ID, p.Node, p.Name, p.[Level]
FROM Ancestors_CTE ac
INNER JOIN Hierarchy p
ON p.Node = ac.Node


-- both return this:

    |--Nested Loops(Inner Join, OUTER REFERENCES:([p].[Node]))
         |--Compute Scalar(DEFINE:([p].[Level]=[Scratch].[dbo].[Hierarchy].[Level] as [p].[Level]))
         |    |--Compute Scalar(DEFINE:([p].[Level]=[Scratch].[dbo].[Hierarchy].[Node] as [p].[Node].GetLevel()))
         |         |--Index Scan(OBJECT:([Scratch].[dbo].[Hierarchy].[IX_Hierarchy_Node] AS [p]))
         |--Top(TOP EXPRESSION:((1)))
              |--Filter(WHERE:([Recr1005]=[Scratch].[dbo].[Hierarchy].[Node] as [p].[Node]))
                   |--Nested Loops(Inner Join, OUTER REFERENCES:([c].[Node]))
                        |--Nested Loops(Inner Join, OUTER REFERENCES:([Value]))
                        |    |--Clustered Index Scan(OBJECT:(@IDs))
                        |    |--Clustered Index Seek(OBJECT:([Scratch].[dbo].[Hierarchy].[PK_Hierarchy] AS [c]), SEEK:([c].[ID]=[Value]) ORDERED FORWARD)
                        |--Index Spool(WITH STACK)
                             |--Concatenation
                                  |--Compute Scalar(DEFINE:([Expr1011]=(0)))
                                  |    |--Constant Scan(VALUES:(([Scratch].[dbo].[Hierarchy].[Node] as [c].[Node].GetAncestor((1)))))
                                  |--Assert(WHERE:(CASE WHEN [Expr1013]>(100) THEN (0) ELSE NULL END))
                                       |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1013], [Recr1003]))
                                            |--Compute Scalar(DEFINE:([Expr1013]=[Expr1012]+(1)))
                                            |    |--Table Spool(WITH STACK)
                                            |--Compute Scalar(DEFINE:([Expr1004]=[Recr1003].GetAncestor((1))))
                                                 |--Filter(WHERE:(STARTUP EXPR([Recr1003].GetAncestor((1)) IS NOT NULL)))
                                                      |--Constant Scan

Very interesting. Submit a bug report at Microsoft Connect, have them tell you what's going on.

This is a guess and just a guess, but perhaps it has something to do w/ how the optimizer makes a pretty good guess at the best execution plan, but does not make an exhaustive search for one.

So, query execution works like this:

parse -> bind -> optimize -> execute

The parse trees for your two queries will certainly be different. The bind trees are probably different. I don't know enough about the bind phase to state that conclusively, but assuming the bind trees are different, then it may require a different number of transforms to get the A and B bind trees to the same execution plan.

If it takes two additional transforms to get query B to the ~5ms plan, the optimizer may say "good enough" before discovering it. Whereas for query A, the ~5ms plan maybe just inside the search cost threshold.

In the first statement, your join is

np INNER JOIN Hierarchy p
    ON p.Node = np.Node

your second statement is

Ancestors_CTE a
INNER JOIN Hierarchy p
ON p.Node = a.Node

However, a is also used as an alias for dbo.GetAncestors(c.Node.GetAncestor(1)) in the CT. Try exchanging Ancestors_CTE a with e.g. Ancestor_CTE acte, to ensure the optimizer isn't confused with the double use of a as an alias.

That said, I'm not sure how good SQL server is at appliying the correct indexes when creating a CTE. I've had problems with this before, and used table variables instead with great success.

As I understand it when using CTEs in batch you must end the statement with a ";". It has something to do with the interpretation of the WITH clause. Try this:

IF OBJECT_ID('_HappyFunction' ) IS NOT NULL DROP FUNCTION _HappyFunction  
IF OBJECT_ID('_NowHappyFunction') IS NOT NULL DROP FUNCTION _NowHappyFunction  
IF TYPE_ID  ('_UniqueIntTable') IS NOT NULL DROP TYPE _UniqueIntTable  
GO  

CREATE TYPE _UniqueIntTable AS TABLE (Value int NOT NULL PRIMARY KEY)  
GO  

CREATE FUNCTION _HappyFunction (@IDs _UniqueIntTable READONLY)  
RETURNS TABLE AS RETURN  
  SELECT Value FROM @IDs  
GO  

CREATE FUNCTION _NowHappyFunction (@IDs _UniqueIntTable READONLY)  
RETURNS @Table TABLE
(
Value INT
)
BEGIN
  ;WITH CTE AS (SELECT Value FROM @IDs)
  INSERT INTO @Table
  SELECT Value FROM CTE
  RETURN
END
GO

-- this will return an empty record set  
DECLARE @IDs _UniqueIntTable   
SELECT * FROM _HappyFunction(@IDs)  
GO  

-- this will no longer hang and will also return an empty record set 
DECLARE @IDs _UniqueIntTable   
SELECT * FROM _NowHappyFunction(@IDs)  
GO

继续阅读：common-table-expression performance sql-server sql-server-2008 user-defined-functions

Bizarre performance issue: Common Table Expressions in inline User-Defined Function

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？