开发者

Need help with a sql query that has an inner and outer join

I really need help getting this query right. I can't share actual table and column names, but will try my best to layout the problem simply.

Assume the following tables. The tables and keys CANNOT be changed. Period. I don't care if you think it's a bad design, this question isn't a design question, it's on SQL syntax.

  • Table A - Primary key named id1
  • Table B - Contains two foreign keys, TableA.id1 and Foo.id2(ignore Foo, it doesn't matter for this)
  • Table C - Contains two foreign keys, TableA.id1 and Foo.id2, additional interesting columns.

Constraints:

  1. The SQL gets a set of id1s passed in as an argument.
  2. It must return a list of Table C rows.
  3. It must only return Table C rows where a Table B row exists with a matching TableA.id1 and Foo.id2 - There ARE rows in Table C that don't match Table B
  4. A row MUST be returned for every id1 passed in, even if no Table C row开发者_开发技巧 exists.

At first I tried a Left Outer Join from Table A to Table B then an Inner Join to Table C. That violates the 4th rule above, as the Inner Join drops out those rows.

Next I tried two Left Outer joins. This is closer, but has the side effect of including rows that match the Table A join to Table B, but don't have a corresponding Table C entry, which isn't what I want.

So, here's what I came up with.

SELECT
  a.id1,
  c.*
FROM
  TableB b
INNER JOIN 
  TableC c USING (id1,id2)
RIGHT OUTER JOIN
  TableA a USING (id1)
WHERE
  a.id1 in (x,y,z)

I'm a bit wary of a Right Outer Join, as the documentation I've read says it can be replaced with a Left Outer, but it doesn't appear so for this case. It also seems a bit rare, which is making other devs nervous, so I'm being cautious.

So, three questions in one.

  1. Is this correct?
  2. Did I use the Right Outer Join correctly?
  3. Is there a cleaner way to achieve the same thing?

EDIT: DB is MySQL


You can rewrite it as a LEFT OUTER JOIN by using parentheses. In pseudo-SQL change this:

SELECT ...
FROM b
INNER JOIN c ON ...
RIGHT OUTER JOIN a ON ...

to this:

SELECT ...
FROM a
LEFT OUTER JOIN (
    b INNER JOIN c ON ...
) ON ...


You can use an EXISTS clause, which sometimes works better

SELECT
  a.id1,
  c.*
FROM TableA a
LEFT JOIN TableC c
 ON c.id1 = a.id1 AND EXISTS (
    select *
    from TableB b
    where b.id1=c.id1 and b.id2=c.id2)
WHERE
  a.id1 in (x,y,z)

As you have written it, it works because ANSI JOINs are always processed top to bottom. Since you need to test B against C before joining to A, it is about the only way to write it without introducing a subquery [(B x C) RIGHT JOIN A]. However, a bad query plan could perform all records in B and C (B x C) before right joining to A.

The EXISTS method efficiently uses the filter on A, then LEFT JOINs to C and for each C found, validates that it also exists in B (or discards).

Q's

  1. Yes your query is correct
  2. Yes
  3. EXISTS should work better


Yeah, you need to start with TableA and then add tables B and C using joins. The only reason you even need TableA is to make sure you have a row for each parameter.

Select a.id1,c.*
From
  TableA a
  Left Join TableB b on a.id1=b.id1
  Left Join TableC c on b.id1=c.id1 and b.id2=c.id2
Where a.id1 in (x,y,z)

You need to do OUTER joins all the way across, or rows that are missing in B will also cause data from A to be filtered out of the result set. By joining C to B (instead of directly to A) you are using B to filter. You could do it with a complicated EXISTS clause, but this is cleaner.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜