Selecting from a subset based on a value inside the same subset?
I have created a table like this:
CREATE TABLE #TEMP(RecordDate datetime, First VARCHAR(255), Last VARCHAR(255), Value int)
INSERT INTO #TEMP VALUES('2011-03-01 00:00:00.000','john','smith','10')
INSERT INTO #TEMP VALUES('2011-03-01 00:00:00.000','john','adams','60')
INSERT INTO #TEMP VALUES('2011-03-01 00:00:00.000','john','resig','90')
INSERT INTO #TEMP VALUES('2011-03-01 00:00:00.000','john','balte','95')
INSERT INTO #TEMP VALUES('2011-03-01 01:00:00.000','john','smith','98')
INSERT INTO #TEMP VALUES('2011-03-01 01:00:00.000','john','adams','67')
INSERT INTO #TEMP VALUES('2011-03-01 01:00:00.000','john','resig','24')
INSERT INTO #TEMP VALUES('2011-03-01 01:00:00.000','john','balte','20')
SELECT * FROM #TEMP
DROP TABLE #TEMP
which now contains the following records:
RecordDate First Last Value
2011-03-01 00:00:00.000 john smith 10
2011-03-01 00:00:00.000 john adams 60
2011-03-01 00:00:00.000 john resig 90
2011-03-01 00:00:00.000 john balte 95
2011-03-01 01:00:00.000 john smith 98
2011-03-01 01:00:00.000 john adams 67
2011-03-01 01:00:00.000 john resig 24
2011-03-01 01:00:00.000 john balte 20
I am trying to obtain a table like the following:
RecordDate first Good Bad
2011-03-01 00:00:00.000 john 3 1
2011-03-01 01:00:00.000 john 2 2
The way I am computing Good and Bad is by taking the MAX
of all people with the first name john
on the specific date and then applying it as a filter on the original dataset for that particular date and first name. Only values greater than 0.5*MAXValue
are considered Good
.
In the result table, there are 3 good values because the maximum value for the first date was 95
and only 60,90,95
are greater than 0.5*95
so the result has (Good,Bad) = (3,1)
. In the second result, likewise, it is (2,2)
.
My table is sufficiently big and has close to 300 million records and I am not able to understand where to start to do this efficiently. Any suggestions on what an efficient way might look like?
My current (working but expensive) approach is give below:
SELECT RecordDate
, FirstName
,
(
SELECT COUNT(*)
FROM #TEMP
WHERE Value > 0.5*(SELECT MAX(Value) FROM #TEMP WHERE RecordDate = A.RecordDate AND FirstName = A.FirstName)
AND RecordDate = A.RecordDate AND FirstName = A.FirstName
) AS Good
,
(
SELECT COUNT(*)
FROM #TEMP
WHERE Value < 0.5*(SELECT MAX(Value) FROM #TEMP WHERE RecordDate = A.RecordDate AND FirstName = A.FirstName)
AND RecordDate = A.RecordDate AND FirstName = A.FirstName
) AS Bad
FROM #TEMP A
GROUP BY RecordDat开发者_高级运维e, FirstName;
Here you go:
select
t.RecordDate,
COUNT(case
when t.Value > MV.MaxValue * 0.5 then 1
else null
end) Good,
COUNT(case
when t.Value <= MV.MaxValue * 0.5 then 1
else null
end) Bad
from #Temp t inner join
(select RecordDate, MAX(Value) MaxValue
from #Temp Group By RecordDate) MV on t.RecordDate = MV.RecordDate
Group by t.RecordDate
The trick is creating a derived table with the max values for each record date and then INNER JOIN
it with the table itself. Once you get the max values solved, you can access them directly.
Update
I see you updated your question and included the first name in the result. Never fear, here's the solution:
select
t.RecordDate,
t.First,
COUNT(case
when t.Value > MV.MaxValue * 0.5 then 1
else null
end) Good,
COUNT(case
when t.Value <= MV.MaxValue * 0.5 then 1
else null
end) Bad
from #Temp t inner join
(select RecordDate, First, MAX(Value) MaxValue
from #Temp Group By RecordDate, First) MV
on (t.RecordDate = MV.RecordDate and t.First = MV.First)
Group by t.RecordDate, t.First
The nested queries that refer to the outer query may be causing a lot of repetitive work. This will just calculate all the MAX for all names and dates in one go:
SELECT RecordDate, FirstName, MAX(Value) FROM #TEMP GROUP BY RecordDate, FirstName
Now join back to the original data:
SELECT A.RecordDate, A.FirstName,
SUM(CASE WHEN Value > MaxVal*0.5 THEN 1 ELSE 0 END) AS GOOD,
SUM(CASE WHEN Value > MaxVal*0.5 THEN 0 ELSE 1 END) AS BAD,
FROM #TEMP A INNER JOIN
(SELECT RecordDate, FirstName, MAX(Value) as MaxVal
FROM #TEMP GROUP BY RecordDate, FirstName) B
ON (A.RecordDate = B.RecordDate AND A.FirstName = B.FirstName)
GROUP BY A.RecordDate, A.FirstName
精彩评论