Compute median of column in SQL common table expression
In MSSQL2008, I am trying to compute the median of a column of numbers from a common table expression using the classic median que开发者_JS百科ry as follows:
WITH cte AS
(
SELECT number
FROM table
)
SELECT cte.*,
(SELECT
(SELECT (
(SELECT TOP 1 cte.number
FROM
(SELECT TOP 50 PERCENT cte.number
FROM cte
ORDER BY cte.number) AS medianSubquery1
ORDER BY cte.number DESC)
+
(SELECT TOP 1 cte.number
FROM
(SELECT TOP 50 PERCENT cte.number
FROM cte
ORDER BY cte.number DESC) AS medianSubquery2
ORDER BY cte.number ASC) ) / 2)) AS median
FROM cte
ORDER BY cte.number
The result set that I get is the following:
NUMBER MEDIAN
x1 x1
x1 x1
x1 x1
x2 x2
x3 x3
In other words, the "median" column is the same as the "number" column when I would expect the median column to be "x1" all the way down. I use a similar expression to compute the mode and it works fine over the same common table expression.
Here's a slightly different way to do it:
WITH cte AS
(
SELECT number
FROM table1
)
SELECT T1.number, T3.median
FROM cte T1,
(
SELECT AVG(number) AS median
FROM
(
SELECT number, ROW_NUMBER() OVER(ORDER BY number) AS rn
FROM cte
) T2
WHERE T2.rn = ((SELECT COUNT(*) FROM table1) + 1) / 2
OR T2.rn = ((SELECT COUNT(*) FROM table1) + 2) / 2
) T3
The problem with your query is that you are doing
SELECT TOP 1 cte.number FROM...
but it isn't correlated with the sub query it is correlated with the Outer query so the subquery is irrelevant. Which explains why you simply end up with the same value all the way down. Removing the cte.
(as below) gives the median of the CTE. Which is a constant value. What are you trying to do?
WITH cte AS
( SELECT NUMBER
FROM master.dbo.spt_values
WHERE TYPE='p'
)
SELECT cte.*,
(SELECT
(SELECT (
(SELECT TOP 1 number
FROM
(SELECT TOP 50 PERCENT cte.number
FROM cte
ORDER BY cte.number) AS medianSubquery1
ORDER BY number DESC)
+
(SELECT TOP 1 number
FROM
(SELECT TOP 50 PERCENT cte.number
FROM cte
ORDER BY cte.number DESC) AS medianSubquery2
ORDER BY number ASC) ) / 2)) AS median
FROM cte
ORDER BY cte.number
Returns
NUMBER median
----------- -----------
0 1023
1 1023
2 1023
3 1023
4 1023
5 1023
6 1023
7 1023
This is not an entirely new answer as it mostly expands on Mark Byer's answer, but there are a couple of options for simplifying the query even further.
The first thing is to really make use of CTE's. Not only can you have multiple CTE's, but they can refer to each other. With this in mind, we can create an additional CTE to compute the median based on the results of the first. This encapsulates the median computation and leaves the actual SELECT to do only what it needs to do. Note that the ROW_NUMBER() had to be moved into the first CTE.
;WITH cte AS
(
SELECT number, ROW_NUMBER() OVER(ORDER BY number) AS rn
FROM table1
),
med AS
(
SELECT AVG(number) AS median
FROM cte
WHERE cte.rn = ((SELECT COUNT(*) FROM cte) + 1) / 2
OR cte.rn = ((SELECT COUNT(*) FROM cte) + 2) / 2
)
SELECT cte.number, med.median
FROM cte
CROSS JOIN med
And to further reduce complexity, you "could" use a custom CLR Aggregate to handle the Median (such as the one provided in the free SQL# library at http://www.SQLsharp.com/ [which I am the author of]).
;WITH cte AS
(
SELECT number
FROM table1
),
med AS
(
SELECT SQL#.Agg_Median(cte.number) AS median
FROM cte
)
SELECT cte.number, med.median
FROM cte
CROSS JOIN med
精彩评论