Selecting from a subset based on a value inside the same subset?

2023-03-23 15:09 问答作者：

I have created a table like this:

CREATE TABLE #TEMP(RecordDate datetime, First VARCHAR(255), Last VARCHAR(255), Value int)

INSERT INTO #TEMP VALUES('2011-03-01 00:00:00.000','john','smith','10')
INSERT INTO #TEMP VALUES('2011-03-01 00:00:00.000','john','adams','60')
INSERT INTO #TEMP VALUES('2011-03-01 00:00:00.000','john','resig','90')
INSERT INTO #TEMP VALUES('2011-03-01 00:00:00.000','john','balte','95')

INSERT INTO #TEMP VALUES('2011-03-01 01:00:00.000','john','smith','98')
INSERT INTO #TEMP VALUES('2011-03-01 01:00:00.000','john','adams','67')
INSERT INTO #TEMP VALUES('2011-03-01 01:00:00.000','john','resig','24')
INSERT INTO #TEMP VALUES('2011-03-01 01:00:00.000','john','balte','20')

SELECT * FROM #TEMP

DROP TABLE #TEMP

which now contains the following records:

RecordDate              First   Last    Value
2011-03-01 00:00:00.000 john    smith   10
2011-03-01 00:00:00.000 john    adams   60
2011-03-01 00:00:00.000 john    resig   90
2011-03-01 00:00:00.000 john    balte   95
2011-03-01 01:00:00.000 john    smith   98
2011-03-01 01:00:00.000 john    adams   67
2011-03-01 01:00:00.000 john    resig   24
2011-03-01 01:00:00.000 john    balte   20

I am trying to obtain a table like the following:

RecordDate                first    Good     Bad
2011-03-01 00:00:00.000   john     3        1
2011-03-01 01:00:00.000   john     2        2

The way I am computing Good and Bad is by taking the MAX of all people with the first name john on the specific date and then applying it as a filter on the original dataset for that particular date and first name. Only values greater than 0.5*MAXValue are considered Good.

In the result table, there are 3 good values because the maximum value for the first date was 95 and only 60,90,95 are greater than 0.5*95 so the result has (Good,Bad) = (3,1). In the second result, likewise, it is (2,2).

My table is sufficiently big and has close to 300 million records and I am not able to understand where to start to do this efficiently. Any suggestions on what an efficient way might look like?

My current (working but expensive) approach is give below:

SELECT    RecordDate
        , FirstName
        , 
        (
            SELECT COUNT(*) 
            FROM #TEMP
            WHERE Value > 0.5*(SELECT MAX(Value) FROM #TEMP WHERE RecordDate = A.RecordDate AND FirstName = A.FirstName)
            AND RecordDate = A.RecordDate AND FirstName = A.FirstName
        ) AS Good
        ,
        (
            SELECT COUNT(*) 
            FROM #TEMP
            WHERE Value < 0.5*(SELECT MAX(Value) FROM #TEMP WHERE RecordDate = A.RecordDate AND FirstName = A.FirstName)
            AND RecordDate = A.RecordDate AND FirstName = A.FirstName
        ) AS Bad
FROM #TEMP A
GROUP BY RecordDat开发者_高级运维e, FirstName;

Here you go:

select 
   t.RecordDate,
   COUNT(case 
           when t.Value > MV.MaxValue * 0.5 then 1
           else null
         end) Good,
   COUNT(case 
           when t.Value <= MV.MaxValue * 0.5 then 1
           else null
         end) Bad
from #Temp t inner join
(select RecordDate, MAX(Value) MaxValue
 from #Temp Group By RecordDate) MV on t.RecordDate = MV.RecordDate
Group by t.RecordDate

The trick is creating a derived table with the max values for each record date and then INNER JOIN it with the table itself. Once you get the max values solved, you can access them directly.

Update

I see you updated your question and included the first name in the result. Never fear, here's the solution:

select 
   t.RecordDate,
   t.First,
   COUNT(case 
           when t.Value > MV.MaxValue * 0.5 then 1
           else null
         end) Good,
   COUNT(case 
           when t.Value <= MV.MaxValue * 0.5 then 1
           else null
         end) Bad
from #Temp t inner join
(select RecordDate, First, MAX(Value) MaxValue
 from #Temp Group By RecordDate, First) MV 
   on (t.RecordDate = MV.RecordDate and t.First = MV.First)
Group by t.RecordDate, t.First

The nested queries that refer to the outer query may be causing a lot of repetitive work. This will just calculate all the MAX for all names and dates in one go:

SELECT RecordDate, FirstName, MAX(Value) FROM #TEMP GROUP BY RecordDate, FirstName

Now join back to the original data:

SELECT A.RecordDate, A.FirstName,
       SUM(CASE WHEN Value > MaxVal*0.5 THEN 1 ELSE 0 END) AS GOOD,
       SUM(CASE WHEN Value > MaxVal*0.5 THEN 0 ELSE 1 END) AS BAD,
FROM #TEMP A INNER JOIN
     (SELECT RecordDate, FirstName, MAX(Value) as MaxVal 
      FROM #TEMP GROUP BY RecordDate, FirstName) B 
         ON (A.RecordDate = B.RecordDate AND A.FirstName = B.FirstName)
GROUP BY A.RecordDate, A.FirstName

继续阅读：select sql sql-server sql-server-2008

Selecting from a subset based on a value inside the same subset?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？