Using a custom aggregate function in a GROUP BY?
I have a simple MEDIAN calculation function:
IF OBJECT_ID(N'COMPUTEMEDIAN', N'FN') IS NOT NULL
DROP FUNCTION dbo.COMPUTEMEDIAN;
GO
CREATE FUNCTION dbo.COMPUTEMEDIAN(@VALUES NVARCHAR(MAX))
RETURNS DECIMAL
WITH EXECUTE AS CALLER
AS
BEGIN
DECLARE @SQL NVARCHAR(MAX)
DECLARE @MEDIAN DECIMAL
SET @MEDIAN = 0.0;
DECLARE @MEDIAN_TEMP TABLE (RawValue DECIMAL);
-- This is the Killer!
INSERT INTO @MEDIAN_TEMP
SELECT s FROM master.dbo.Split(',', @VALUES) OPTION(MAXRECURSION 0)
SELECT @MEDIAN =
(
(SELECT MAX(RawValue) FROM
(SELECT TOP 50 PERCENT RawValue FROM @MEDIAN_TEMP ORDER BY RawValue) AS BottomHalf)
+
(SELECT MIN(RawValue) FROM
(SELECT TOP 50 PERCENT RawValue FROM @MEDIAN_TEMP ORDER BY RawValue DESC) AS TopHalf)
) / 2
--PRINT @SQL
RETURN @MEDIAN;
END;
GO
However, my table is of the following form:
CREATE TABLE #TEMP (GroupName VARCHAR(MAX), Value DECIMAL)
INSERT INTO #TEMP VALUES ('A', 1.0)
INSERT INTO #TEMP VALUES ('A', 2.0)
INSERT INTO #TEMP VALUES ('A', 3.0)
INSERT INTO #TEMP VALUES ('A', 4.0)
INSERT INTO #TEMP VALUES ('B', 10.0)
INSERT INTO #TEMP VALUES ('B', 11.0)
INSERT INTO #TEMP VALUES ('B', 12.0)
SELECT * FROM #TEMP
DROP TABLE #TEMP
What is the best way to invoke the MEDIAN
function on this table using a GROUP BY
on the id
column? So, I am looking for something like this:
SELECT id, COMPUTEMEDIAN(Values)
FROM #TEMP
GROUP BY id
My current approach involves using XMLPATH
to combine all values result开发者_如何学Cing from a GROUP BY
operation into a large string and then passing it to the function but this involves the String splitting operation and for large strings this just slows down everything. Any suggestions?
Since you're using SQL Server 2008, I would suggest writing the aggregate function as a CLR function.
http://msdn.microsoft.com/en-us/library/91e6taax(v=vs.80).aspx
Also, people have asked this question before. Perhaps their answers would be helpful
Function to Calculate Median in Sql Server
EDIT: I can confirm this works very very well on a large database (30,000 values)
Hmm... Just came across this so the following works perfectly fine but am not sure how expensive it can turn out to be:
SELECT
GroupName,
AVG(Value)
FROM
(
SELECT
GroupName,
cast(Value as decimal(5,2)) Value,
ROW_NUMBER() OVER (
PARTITION BY GroupName
ORDER BY Value ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY GroupName
ORDER BY Value DESC) AS RowDesc
FROM #TEMP SOH
) x
WHERE
RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY GroupName
ORDER BY GroupName;
No need to use a user defined function! Here's how I would do it:
CREATE TABLE #TEMP (id VARCHAR(MAX), Value DECIMAL)
INSERT INTO #TEMP VALUES('A', 1.0)
INSERT INTO #TEMP VALUES('A', 2.0)
INSERT INTO #TEMP VALUES('A', 3.0)
INSERT INTO #TEMP VALUES('A', 4.0)
INSERT INTO #TEMP VALUES('B', 10.0)
INSERT INTO #TEMP VALUES('B', 11.0)
INSERT INTO #TEMP VALUES('B', 12.0)
SELECT
(SELECT TOP 1 Value
FROM (SELECT TOP(calcs.medianIndex) Value
FROM #temp
WHERE #temp.ID = calcs.ID ORDER BY Value ASC) AS subSet
ORDER BY subSet.Value DESC), ID
FROM
(SELECT
CASE WHEN count(*) % 2 = 1 THEN count(*)/2 + 1
ELSE count(*)/2
END AS medianIndex,
ID
FROM #TEMP
GROUP BY ID) AS calcs
DROP TABLE #TEMP
Might want to double check the behavior when there is an even number of records.
EDIT: After reviewing your work in your Median function, I realize that my answer basically just moved your work out of the function and into your regular query. So... why does your median calculation have to be inside of the user-defined function? It seems alot more difficult that way.
精彩评论