Why is this SQL SUM statement correct?
In my schema, I have a table Projects, and a table Tasks. Each project is comprised of tasks. Each task has Hours and PercentComplete.
Example table:
ProjectID TaskID Hours PercentComplete 1 1 100 50 1 2 120 80
I am trying to get the weighted percentage complete for the project. I am doing this using the following SQL statement:
SELECT P.ProjectID, P.ProjectName, SUM(T.Hours) AS Hours,
SUM(开发者_StackOverflowT.PercentComplete * T.Hours) / 100 AS CompleteHours,
SUM(T.PercentComplete * T.Hours) / SUM(T.Hours) AS PercentComplete
FROM Projects AS P INNER JOIN
Tasks AS T ON T.ProjectID = P.ProjectID
WHERE (P.ProjectID = 1)
My question is about this part of that statement:
SUM(T.PercentComplete * T.Hours) / SUM(T.Hours) AS PercentComplete
This gives me the correct weighted percentage for this project (in the case of the sample data above, 66%). But I cannot seem to wrap my head around why it does this.
Why does this query work?
SUM(T.PercentComplete * T.Hours) / 100
is the number of complete hours.SUM(T.Hours)
is the total number of hours.The ratio of these two amounts, i.e.:
(SUM(T.PercentComplete * T.Hours) / 100) / SUM(T.Hours)
is the proportion of hours complete (it should be between 0 and 1).
- Multiplying this by 100 gives the percentage.
I prefer to keep percentages like this out of the database and move them to the presentation layer. It would be much easier if the database stored "hours completed" and "hours total" and did not store the percentages at all. The extra factors of 100 in the calculations confuse the issue.
Basically you are finding the number of hours completed over the number of hours total.
SUM(T.PercentComplete * T.Hours)
computes the total number of hours that you have completed. (100 * 50) = 50 * 100 + (120 * 80) = 146 * 100
is the numerator. 146
hours have been completed on this job, and we keep a 100
multiplier for the percent (because it is [0-100]
instead of [0-1]
)
Then we find the total number of hours worked, SUM(T.Hours)
, which is 100 + 120 = 220
.
Then dividing, we find the weighted average. (146 * 100) / 220 = 0.663636364 * 100 = 66.4%
Is this what you were wondering about?
It calculates the two sums individually by adding up the value for each row then divides them at the end
SUM(T.PercentComplete * T.Hours)
50* 100 +
80 * 120
-------
14,600
SUM(T.Hours)
100 +
120
---
220
Then the division at the end
14,600 / 220
------------
66.3636
Edit As per HLGEM's comment it will actually return 66 due to integer division.
Aggregate functions, such as SUM(), work against the set of data defined by the GROUP BY clause. So if you group by ProjectID, ProjectName, the functions will break things down by that.
The SUM peratiorn first multiply the columns than add
( 100* 50+ 120* 80) / (100+ 120)
精彩评论