开发者

Group by + joins

Hi I am having a problems using Group By and joins between 3 tables.

I have a project table with various fields and a projectcode fields. I then have an invoice table and an hours table and each can have multiple rows per project. Both of these table have project code also.

The two SUM values are not calculating correctly and I am realy struggling to se开发者_C百科e where the issue is.

Here the sql I am using:

SELECT  dbo.project.projectcode, 
        dbo.project.client, 
        dbo.project.project, 
        dbo.project.budget, 
        dbo.project.budget * 80 AS value, 
        SUM(dbo.harvest.hours) AS hourslogged, 
        SUM(dbo.salesforce.value) AS invoiced
FROM  dbo.salesforce 
    RIGHT OUTER JOIN dbo.project 
        ON dbo.salesforce.projectcode = dbo.project.projectcode 
    LEFT OUTER JOIN dbo.harvest 
        ON dbo.project.projectcode = dbo.harvest.projectcode
GROUP BY    dbo.project.projectcode, 
            dbo.salesforce.projectcode, 
            dbo.harvest.projectcode, 
            dbo.project.project, 
            dbo.project.client, 
            dbo.project.budget

Any help or tips on this would be much appreciated!


Whenever each of the two tables, dbo.salesforce and dbo.harvest, have more than 1 match for every projectcode, a mini-Cartesian product happens. Here's a simple illustration. Suppose there are tables A and B, like this:

  • Table A:

    AID  AVALUE
    ---  -------
    1    ValueA1
    2    ValueA2
    
  • Table B:

    BID  BVALUE   AID
    ---  -------  ---
    1    ValueB1  1
    2    ValueB2  1
    3    ValueB3  2
    

Now if we performed this join:

SELECT * FROM A JOIN B ON A.AID = B.AID

the result would be:

AID  AVALUE   BID  BVALUE   AID
---  -------  ---  -------  ---
1    ValueA1  1    ValueB1  1
1    ValueA1  2    ValueB2  1
2    ValueA2  3    ValueB3  2

Enter table C:

CID  CVALUE   AID
---  -------  ---
1    ValueC1  1
2    ValueC2  1
3    ValueC3  1

And the join now is this:

SELECT * FROM A JOIN B ON A.AID = B.AID JOIN C ON A.AID = C.AID

What would be the result? Here:

AID  AVALUE   BID  BVALUE   AID  CID  CVALUE   AID
---  -------  ---  -------  ---  ---  -------  ---
1    ValueA1  1    ValueB1  1    1    ValueC1  1
1    ValueA1  1    ValueB1  1    2    ValueC2  1
1    ValueA1  1    ValueB1  1    3    ValueC3  1
1    ValueA1  2    ValueB2  1    1    ValueC3  1
1    ValueA1  2    ValueB2  1    2    ValueC3  1
1    ValueA1  2    ValueB2  1    3    ValueC3  1

As you can see, every match from B is repeated three times, for how many matches C has got. And, similarly, every match from C is repeated twice, because that is how many matches there are in B. The 'luckiest', of course, is the row from A, because it is repeated 2 × 3 = 6 times. That is a Cartesian join for you. And that's just what happens in your case too.

Not sure whether it is considered typical, but in such cases I would often group each table separately by the joining expression(s), then join the result sets. Your query would then look like this:

SELECT
  p.projectcode, 
  p.client, 
  p.project, 
  p.budget, 
  p.budget * 80 AS value, 
  h.hourslogged, 
  s.invoiced
FROM dbo.project p
  LEFT JOIN (
    SELECT
      projectcode,
      SUM(dbo.salesforce.value) AS invoiced
    FROM dbo.salesforce
    GROUP BY projectcode
  ) s ON p.projectcode = s.projectcode
  LEFT JOIN (
    SELECT
      projectcode,
      SUM(dbo.harvest.hours) AS hourslogged
    FROM dbo.harvest 
    GROUP BY projectcode
  ) h ON p.projectcode = h.projectcode


I'd suggest to avoid mixing right and left outer join. Your central table is Project, so use it first.

SELECT  dbo.project.projectcode, 
        dbo.project.client, 
        dbo.project.project, 
        dbo.project.budget, 
        dbo.project.budget * 80 AS value, 
        SUM(dbo.harvest.hours) AS hourslogged, 
        SUM(dbo.salesforce.value) AS invoiced
FROM    dbo.project      
            LEFT OUTER JOIN dbo.salesforce
                ON dbo.salesforce.projectcode = dbo.project.projectcode 
            LEFT OUTER JOIN dbo.harvest 
                ON dbo.project.projectcode = dbo.harvest.projectcode
GROUP BY    dbo.project.projectcode, 
            dbo.project.project, 
            dbo.project.client, 
            dbo.project.budget

But the error come from the GROUP BY. You don't have to group by the two tables on which you are doing the aggregate, else your aggregate will not be good !

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜