开发者

SQL JOIN question (yet another one)

Sounds simple but I'm stuck

Table A                 Table B

col_a  col_b          col_a  col_c
1      b              1      c
2      c              2      d
3      z              3      a
4      d              4      e
33     a              5      k
                      6      l
                      33     b
                      33     b

I want to JOIN table A with B:

select * from A inner join B on A.col_a = B.col_a

I am expecting to get 5 records as a result.

Expected join result         ** Actual result **

col_a  col_b col_c col_x[n]...       col_a col_b col_c  col_y[n]...
1      b     c     ...                   1     b     c  ...
2      c     d     ...                   2     c     d  ...
3      z     a     ...                   3     z     a  ...
4      d     e     ...                   4     d     e  ...
33     a     b     ...                   33    a     b  ...
   开发者_JAVA技巧                                      33    a     b  ...

Why did MySQL match 33 twice? Because they are 2 values with 33 in table B.

What I want though, is just one record with the same value in col_a. How do I do that?

EDIT: I am updating the tables' design to include more columns that contain non-identical data, because them being as they were posed more questions than solved problems. Anyway, the answer to this is to use GROUP BY, but the performance penalty is huge, especially on a table that contains above 50 million records (and growing). However, the best approach to solve my problem was to use a compound statement (using UNION ALL) for every distinct value in col_a. The performance benefit was x5 ~ x10 faster !!


You have 33 twice in Table B.

Either SELECT DISTINCT or GROUP BY col_a, ...:

SELECT DISTINCT * 
FROM    A 
JOIN    B ON ( A.col_a = B.col_a )
;

or

SELECT    * 
FROM      A 
JOIN      B ON ( A.col_a = B.col_a )
GROUP BY  col_a, col_b, col_c
;

You should clean up that table, though. Depending on how many occurrences of a repeated row, it might be faster to use a subquery:

SELECT  * 
FROM    A 
JOIN    (select distinct * from B) AS C
        ON ( A.col_a = C.col_a )
;


The quick & dirty answer is:

select DISTINCT * from A inner join B on A.col_a = B.col_a

But the real question is, why do you have two identical entries in Table B?

Usually when you have to use DISTINCT, it indicates a problem in your data model.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜