25th percentile for multiple columns
Let's say I have a table with columns A
, B
, C
, D
, E
, etc. where each column is an int
. This table is populated with hundreds of rows with random values in each column. What is the optimal way to return the value corresponding to the 25th percentile for each column?
For example:
A B C D E
1 5 8 9 3
3 6 5 0 2
8 3 6 1 8
4 1 8 3 1
7 2 2 6 9
Here column A
is composed of the values 1, 3, 4, 8, and 7. The 25th percentile for this set should be 3. Column B
is composed of the values 1, 2, 3, 5, and 6. 25th percent开发者_StackOverflow中文版ile for this set should be 2. C
is 5, D
is 1, and E
is 2. Since there are 5 rows, the 25th percentile is simply the second smallest value. If there were 40 rows it would be the 10th smallest value. How can I efficiently write a query that returns:
A B C D E
3 2 5 1 2
What I've tried:
WITH
ATable as (
SELECT A, CAST(NTILE(100) OVER (ORDER BY A) as int) as Percentile
FROM MyTable
),
BTable as (
SELECT B, CAST(NTILE(100) OVER (ORDER BY B) as int) as Percentile
FROM MyTable
),
...
SELECT
(SELECT TOP 1 A FROM ATable where Percentile = 25) as A,
(SELECT TOP 1 B FROM BTable where Percentile = 25) as B,
...
I have about 40 of these columns and my experience with my dataset and NTILE is with that many columns, performance becomes a serious problem. Does anyone have a better idea?
Your solution is correct. I don't see a whizzier solution.
You require a different row per column so you have to query each column separately. If you have 40 columns then it's 40 CTEs joined together. Simple.
Normally, you'd expect a whole row or group or rows or a subset of columns for the rows(s) that meet some criteria: not expect each column to be queried separately to get values unrelated to other values in the same row.
I am not sure if the following article is still relevant for the current versions of MS SQL server, but you might check it out
A Better NTILE
精彩评论