Median values in T-SQL

2023-04-01 13:31 问答作者：

For even rows, the formula for median is (104.5 + 108)/2 for the table below and for odd r开发者_运维百科ows it is 108 for table below:

Total       Total

100         100
101         101
104.5       104.5
108         108
108.3       108.3
112         112
            114

I wrote this query, and it is calculating the correct median when the number of rows are odd:

WITH    a AS ( SELECT   Total ,
                        ROW_NUMBER() OVER ( ORDER BY CAST(Total AS FLOAT) ASC ) rownumber
               FROM     [Table] A
             ),
        b AS ( SELECT TOP 2
                        Total ,
                        isodd
               FROM     ( SELECT TOP 50 PERCENT
                                    Total ,
                                    rownumber % 2 isodd
                          FROM      a
                          ORDER BY  CAST(Total AS FLOAT) ASC
                        ) a
               ORDER BY CAST(total AS FLOAT) DESC
             )
    SELECT  *
    FROM    b

What is the general T-SQL query to find the median in both situations? Like when the number of rows are odd and also when the number of rows is even?

Could my query be twisted so that it can work for the median in both even and odd number of rows situations?

I wrote a blog about Mean, Median and Mode a couple years ago. I encourage you to read it.

Calculating Mean, Median, and Mode with SQL Server

SELECT ((
        SELECT TOP 1 Total
        FROM   (
                SELECT  TOP 50 PERCENT Total
                FROM    [TABLE] A
                WHERE   Total IS NOT NULL
                ORDER BY Total
                ) AS A
        ORDER BY Total DESC) +
        (
        SELECT TOP 1 Total
        FROM   (
                SELECT  TOP 50 PERCENT Total
                FROM    [TABLE] A
                WHERE   Total IS NOT NULL
                ORDER BY Total DESC
                ) AS A
        ORDER BY Total ASC)) / 2

I know you were looking for a solution that works with SQL Server 2008, but in case anyone is looking for the MEDIAN() aggregate function in SQL Server 2012, they can emulate it using the PERCENTILE_CONT() inverse distribution function:

WITH t(value) AS (
  SELECT 1   UNION ALL
  SELECT 2   UNION ALL
  SELECT 100 
)
SELECT DISTINCT
  percentile_cont(0.5) WITHIN GROUP (ORDER BY value) OVER (PARTITION BY 1)
FROM
  t;

This emulation of MEDIAN() via PERCENTILE_CONT() is also documented here. Unfortunately, SQL Server only supports this function as a window function, not as a regular ordered-set aggregate function like Oracle or PostgreSQL.

An example for issue mentioned in my comment to the accepted answer:

select avg(Total) median from
(
select Total, 
rnasc = row_number() over(order by Total),
rndesc = row_number() over(order by Total desc)
 from [Table] 
) b
where rnasc between rndesc - 1 and rndesc + 1

This snippet is not guaranteed to work if there are duplicate values in the input dataset - therefore row_number() will not provide expected values.

For example for the input:

DROP TABLE #b
CREATE TABLE #b (id INT IDENTITY, Total INT)
INSERT INTO #b 
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT  5 
UNION ALL SELECT  5 UNION ALL SELECT  5

Inner query returns (I guess it may differs on different servers):

Total   rnasc   rndesc
5       3      1
5       4      2
5       5      3
1       1      4
1       2      5

Runnig outer query will result into NULL (as there is no row where rnasc between rndesc - 1 and rndesc + 1)

Simple solution is add some surrogate key (I used identity column) into data set and include this column in OVER() clause:

SELECT avg(Total) median from
(
SELECT Total, 
rnasc = row_number() over(order by Total, id),
rndesc = row_number() over(order by Total DESC, id desc)
 from #b
) b
WHERE rnasc between rndesc - 1 and rndesc + 1

Now sorting order is guaranteed and inner query returns:

Total   rnasc   rndesc
5       5       1
5       4       2
5       3       3
1       2       4
1       1       5

And result is correct :)

t-clausens answer unfortunately does not work correctly, when there are lots of duplicate values in the list. Then the row numbers generated by different OVER clauses are not predictable in way, that this query works.

The following worked well in my case:

WITH SortedTable AS
    (
        SELECT Total, 
               rnasc, 
               rndesc = ROW_NUMBER() OVER(ORDER BY rnasc DESC)
        FROM ( 
               SELECT Total, 
                      rnasc = ROW_NUMBER() OVER(ORDER BY Total)
               FROM   [Table]
             ) SourceTable
    )
SELECT DISTINCT AVG(Total) median 
FROM   SortedTable
WHERE  rnasc = rndesc OR ABS(rnasc-rndesc) = 1

The WHERE clause now also clearly distinguishes between even and odd number of records.

I know this is an ancient question but for other people's sake I am posting this anyway. The performance of PERCENTILE_COUNT(0.5) is stupid slow. I have a table with 4.9 million records and PERCENTILE_COUNT(0.5) took 52 seconds. G Mastros answer above is better (and my favorite, except for mine) but it still took 35 seconds on my table. I tweaked his solution to do the following and it ran in 7 seconds without an index on the column. When I added an index it dropped to 2 seconds. All I did was replace the 50 PERCENT with an integer division of the record count in the table.

DECLARE @Cnt int = (SELECT COUNT(*) FROM [TABLE]);

SELECT ((
    SELECT TOP 1 Total
    FROM   (
            SELECT  TOP (@Cnt/2) Total
            FROM    [TABLE] A
            WHERE   Total IS NOT NULL
            ORDER BY Total
            ) AS A
    ORDER BY Total DESC) +
    (
    SELECT TOP 1 Total
    FROM   (
            SELECT  TOP (@Cnt/2) Total
            FROM    [TABLE] A
            WHERE   Total IS NOT NULL
            ORDER BY Total DESC
            ) AS A
    ORDER BY Total ASC)) / 2

继续阅读：sql sql-server-2008 tsql

Median values in T-SQL

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？