Group by + joins
Hi I am having a problems using Group By and joins between 3 tables.
I have a project table with various fields and a projectcode fields. I then have an invoice table and an hours table and each can have multiple rows per project. Both of these table have project code also.
The two SUM values are not calculating correctly and I am realy struggling to se开发者_C百科e where the issue is.
Here the sql I am using:
SELECT dbo.project.projectcode,
dbo.project.client,
dbo.project.project,
dbo.project.budget,
dbo.project.budget * 80 AS value,
SUM(dbo.harvest.hours) AS hourslogged,
SUM(dbo.salesforce.value) AS invoiced
FROM dbo.salesforce
RIGHT OUTER JOIN dbo.project
ON dbo.salesforce.projectcode = dbo.project.projectcode
LEFT OUTER JOIN dbo.harvest
ON dbo.project.projectcode = dbo.harvest.projectcode
GROUP BY dbo.project.projectcode,
dbo.salesforce.projectcode,
dbo.harvest.projectcode,
dbo.project.project,
dbo.project.client,
dbo.project.budget
Any help or tips on this would be much appreciated!
Whenever each of the two tables, dbo.salesforce
and dbo.harvest
, have more than 1 match for every projectcode
, a mini-Cartesian product happens. Here's a simple illustration. Suppose there are tables A
and B
, like this:
Table
A
:AID AVALUE --- ------- 1 ValueA1 2 ValueA2
Table
B
:BID BVALUE AID --- ------- --- 1 ValueB1 1 2 ValueB2 1 3 ValueB3 2
Now if we performed this join:
SELECT * FROM A JOIN B ON A.AID = B.AID
the result would be:
AID AVALUE BID BVALUE AID
--- ------- --- ------- ---
1 ValueA1 1 ValueB1 1
1 ValueA1 2 ValueB2 1
2 ValueA2 3 ValueB3 2
Enter table C
:
CID CVALUE AID
--- ------- ---
1 ValueC1 1
2 ValueC2 1
3 ValueC3 1
And the join now is this:
SELECT * FROM A JOIN B ON A.AID = B.AID JOIN C ON A.AID = C.AID
What would be the result? Here:
AID AVALUE BID BVALUE AID CID CVALUE AID
--- ------- --- ------- --- --- ------- ---
1 ValueA1 1 ValueB1 1 1 ValueC1 1
1 ValueA1 1 ValueB1 1 2 ValueC2 1
1 ValueA1 1 ValueB1 1 3 ValueC3 1
1 ValueA1 2 ValueB2 1 1 ValueC3 1
1 ValueA1 2 ValueB2 1 2 ValueC3 1
1 ValueA1 2 ValueB2 1 3 ValueC3 1
As you can see, every match from B
is repeated three times, for how many matches C
has got. And, similarly, every match from C
is repeated twice, because that is how many matches there are in B
. The 'luckiest', of course, is the row from A
, because it is repeated 2 × 3 = 6 times. That is a Cartesian join for you. And that's just what happens in your case too.
Not sure whether it is considered typical, but in such cases I would often group each table separately by the joining expression(s), then join the result sets. Your query would then look like this:
SELECT
p.projectcode,
p.client,
p.project,
p.budget,
p.budget * 80 AS value,
h.hourslogged,
s.invoiced
FROM dbo.project p
LEFT JOIN (
SELECT
projectcode,
SUM(dbo.salesforce.value) AS invoiced
FROM dbo.salesforce
GROUP BY projectcode
) s ON p.projectcode = s.projectcode
LEFT JOIN (
SELECT
projectcode,
SUM(dbo.harvest.hours) AS hourslogged
FROM dbo.harvest
GROUP BY projectcode
) h ON p.projectcode = h.projectcode
I'd suggest to avoid mixing right and left outer join. Your central table is Project, so use it first.
SELECT dbo.project.projectcode,
dbo.project.client,
dbo.project.project,
dbo.project.budget,
dbo.project.budget * 80 AS value,
SUM(dbo.harvest.hours) AS hourslogged,
SUM(dbo.salesforce.value) AS invoiced
FROM dbo.project
LEFT OUTER JOIN dbo.salesforce
ON dbo.salesforce.projectcode = dbo.project.projectcode
LEFT OUTER JOIN dbo.harvest
ON dbo.project.projectcode = dbo.harvest.projectcode
GROUP BY dbo.project.projectcode,
dbo.project.project,
dbo.project.client,
dbo.project.budget
But the error come from the GROUP BY. You don't have to group by the two tables on which you are doing the aggregate, else your aggregate will not be good !
精彩评论