Can somebody explain the running total and SQL self-join in this tutorial to me?
I was readi开发者_如何转开发ng over the tutorial here: http://www.1keydata.com/sql/sql-running-totals.html and it all made sense until it suddenly got extremely ridiculously unbelievably complicated when it got to rank, median, running totals, etc. Can somebody explain in plain English how that query results in a running total? Thanks!
Before I get started, I've not seen this before and it doesn't look like a terribly comprehensible way to accomplish a running total.
Okay, here's the query from the tutorial:
SELECT a1.Name, a1.Sales, SUM(a2.Sales) Running_Total
FROM Total_Sales a1, Total_Sales a2
WHERE a1.Sales <= a2.sales or (a1.Sales=a2.Sales and a1.Name = a2.Name)
GROUP BY a1.Name, a1.Sales
ORDER BY a1.Sales DESC, a1.Name DESC;
And the sample output
Name Sales Running_Total
Greg 50 50
Sophia 40 90
Stella 20 110
Jeff 20 130
Jennifer 15 145
John 10 155
The simple part of this query is displaying the sales data for each employee. All we're doing is selecting name
and sales
from each employee and ordering them by the sale amount (descending). This gives us our base list.
Now for the running total, we want every row that has already been displayed. So, we join the table against itself, on each row that would already have been displayed:
WHERE a1.Sales <= a2.sales or (a1.Sales=a2.Sales and a1.Name = a2.Name)
Then we use the SUM
aggregate function and group accordingly. A good way to understand this is if you look at what would happen if you didn't use the group function. The 'Sophia' row would look like this:
Name A1.Sales A2.Sales
Sophia 40 50
Sophia 40 40
Notice how we got Greg's sales row? The group will sum that up, and voila!
Hope that helps. Joe
The first table joins to itself, the join resulting in x number of rows, where x is the number of rows that have total sales lower than itself, or the name in the row is the same (i.e. all those sales previous to the row we are looking at, when ordered by sales amount).
It then groups on the fields in the left side of the join and sums the rows we join to, thus a running total. To see how it works, you might want to run it without the sum and grouping, to see the raw results returned.
The SQL above gives a different result on Sybase (ASE 15). I think the reason is that the 'order by' is not applied until display time. Here is the SQL and the result:
drop table Total_Sales
go
create table Total_Sales
(
Name char(15),
Sales int
)
INSERT INTO Total_Sales VALUES( 'John', 10 )
INSERT INTO Total_Sales VALUES( 'Jennifer', 15)
INSERT INTO Total_Sales VALUES('Stella', 20 )
INSERT INTO Total_Sales VALUES('Sophia', 40 )
INSERT INTO Total_Sales VALUES('Greg', 50 )
INSERT INTO Total_Sales VALUES('Jeff', 20 )
SELECT a1.Name, a1.Sales, SUM(a2.Sales) Running_Total
FROM Total_Sales a1, Total_Sales a2
WHERE a1.Sales <= a2.Sales or (a1.Sales=a2.Sales and a1.Name = a2.Name)
GROUP BY a1.Name, a1.Sales
ORDER BY a1.Sales DESC, a1.Name DESC
Result:
Name Sales Running_Total
Greg 50 50
Sophia 40 90
Stella 20 130 --note that two running totals are the same!
Jeff 20 130
Jennifer 15 145
John 10 155
Bob
I also get the same incorrect output as Bob above where the running total breaks down at Stella & Jeff, who have the same sales number. I'm using SQL Server 2014 Management Studio Express. I don't think the website's solution is actually correct. I did the join based on name instead of on sales and came up with these, which produce a correct running total:
select a1.name
, a1.sales
, sum(a2.sales) 'running_total'
from #total_sales a1
inner join #total_sales a2 on a1.name <= a2.name
group by a1.name, a1.sales
order by sum(a2.sales);
Yields:
name sales running_total
Stella 20 20
Sophia 40 60
John 10 70
Jennifer 15 85
Jeff 20 105
Greg 50 155
You could also do the variant below if you're uncomfortable sorting on an aggregate. It changes the order, but the running total is still correct:
select a1.name
, a1.sales
, sum(a2.sales) 'running_total'
from #total_sales a1
inner join #total_sales a2 on a1.name >= a2.name
group by a1.name, a1.sales
order by a1.name;
Yields:
name sales running_total
Greg 50 50
Jeff 20 70
Jennifer 15 85
John 10 95
Sophia 40 135
Stella 20 155
精彩评论