开发者

Using a custom aggregate function in a GROUP BY?

I have a simple MEDIAN calculation function:

IF OBJECT_ID(N'COMPUTEMEDIAN', N'FN') IS NOT NULL
    DROP FUNCTION dbo.COMPUTEMEDIAN;
GO
CREATE FUNCTION dbo.COMPUTEMEDIAN(@VALUES NVARCHAR(MAX))
RETURNS DECIMAL
WITH EXECUTE AS CALLER
AS
BEGIN
    DECLARE @SQL NVARCHAR(MAX)
    DECLARE @MEDIAN DECIMAL
    SET @MEDIAN = 0.0;

    DECLARE @MEDIAN_TEMP TABLE (RawValue DECIMAL);

    -- This is the Killer!
    INSERT INTO @MEDIAN_TEMP
    SELECT s FROM master.dbo.Split(',', @VALUES) OPTION(MAXRECURSION 0)  

    SELECT @MEDIAN =
    (
     (SELECT MAX(RawValue) FROM
       (SELECT TOP 50 PERCENT RawValue FROM @MEDIAN_TEMP ORDER BY RawValue) AS BottomHalf)
     +
     (SELECT MIN(RawValue) FROM
       (SELECT TOP 50 PERCENT RawValue FROM @MEDIAN_TEMP ORDER BY RawValue DESC) AS TopHalf)
    ) / 2

    --PRINT @SQL
    RETURN @MEDIAN;
END;
GO

However, my table is of the following form:

CREATE TABLE #TEMP (GroupName VARCHAR(MAX), Value DECIMAL)
INSERT INTO #TEMP VALUES ('A', 1.0)
INSERT INTO #TEMP VALUES ('A', 2.0)
INSERT INTO #TEMP VALUES ('A', 3.0)
INSERT INTO #TEMP VALUES ('A', 4.0)
INSERT INTO #TEMP VALUES ('B', 10.0)
INSERT INTO #TEMP VALUES ('B', 11.0)
INSERT INTO #TEMP VALUES ('B', 12.0)

SELECT * FROM #TEMP

DROP TABLE #TEMP

What is the best way to invoke the MEDIAN function on this table using a GROUP BY on the id column? So, I am looking for something like this:

SELECT id, COMPUTEMEDIAN(Values)
FROM #TEMP
GROUP BY id

My current approach involves using XMLPATH to combine all values result开发者_如何学Cing from a GROUP BY operation into a large string and then passing it to the function but this involves the String splitting operation and for large strings this just slows down everything. Any suggestions?


Since you're using SQL Server 2008, I would suggest writing the aggregate function as a CLR function.

http://msdn.microsoft.com/en-us/library/91e6taax(v=vs.80).aspx

Also, people have asked this question before. Perhaps their answers would be helpful

Function to Calculate Median in Sql Server


EDIT: I can confirm this works very very well on a large database (30,000 values)

Hmm... Just came across this so the following works perfectly fine but am not sure how expensive it can turn out to be:

SELECT
   GroupName,
   AVG(Value)
FROM
(
   SELECT
      GroupName,
      cast(Value as decimal(5,2)) Value,
      ROW_NUMBER() OVER (
         PARTITION BY GroupName
         ORDER BY Value ASC) AS RowAsc,
      ROW_NUMBER() OVER (
         PARTITION BY GroupName 
         ORDER BY Value DESC) AS RowDesc
   FROM #TEMP SOH
) x
WHERE 
   RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY GroupName
ORDER BY GroupName;


No need to use a user defined function! Here's how I would do it:

CREATE TABLE #TEMP (id VARCHAR(MAX), Value DECIMAL)

INSERT INTO #TEMP VALUES('A', 1.0)

INSERT INTO #TEMP VALUES('A', 2.0)
INSERT INTO #TEMP VALUES('A', 3.0)
INSERT INTO #TEMP VALUES('A', 4.0)
INSERT INTO #TEMP VALUES('B', 10.0)
INSERT INTO #TEMP VALUES('B', 11.0)
INSERT INTO #TEMP VALUES('B', 12.0)

SELECT 
    (SELECT TOP 1 Value 
        FROM (SELECT TOP(calcs.medianIndex) Value 
                FROM #temp 
                WHERE #temp.ID = calcs.ID ORDER BY Value ASC) AS subSet
        ORDER BY subSet.Value DESC), ID
FROM
(SELECT 
    CASE WHEN count(*) % 2 = 1 THEN count(*)/2 + 1
        ELSE count(*)/2
    END AS medianIndex,
 ID
FROM #TEMP 
GROUP BY ID) AS calcs

DROP TABLE #TEMP

Might want to double check the behavior when there is an even number of records.

EDIT: After reviewing your work in your Median function, I realize that my answer basically just moved your work out of the function and into your regular query. So... why does your median calculation have to be inside of the user-defined function? It seems alot more difficult that way.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜