SQL JOIN question (yet another one)
Sounds simple but I'm stuck
Table A Table B
col_a col_b col_a col_c
1 b 1 c
2 c 2 d
3 z 3 a
4 d 4 e
33 a 5 k
6 l
33 b
33 b
I want to JOIN table A with B:
select * from A inner join B on A.col_a = B.col_a
I am expecting to get 5 records as a result.
Expected join result ** Actual result **
col_a col_b col_c col_x[n]... col_a col_b col_c col_y[n]...
1 b c ... 1 b c ...
2 c d ... 2 c d ...
3 z a ... 3 z a ...
4 d e ... 4 d e ...
33 a b ... 33 a b ...
开发者_JAVA技巧 33 a b ...
Why did MySQL match 33 twice? Because they are 2 values with 33 in table B.
What I want though, is just one record with the same value in col_a. How do I do that?
EDIT: I am updating the tables' design to include more columns that contain non-identical data, because them being as they were posed more questions than solved problems. Anyway, the answer to this is to use GROUP BY, but the performance penalty is huge, especially on a table that contains above 50 million records (and growing). However, the best approach to solve my problem was to use a compound statement (using UNION ALL) for every distinct value in col_a. The performance benefit was x5 ~ x10 faster !!
You have 33
twice in Table B.
Either SELECT DISTINCT
or GROUP BY col_a, ...
:
SELECT DISTINCT *
FROM A
JOIN B ON ( A.col_a = B.col_a )
;
or
SELECT *
FROM A
JOIN B ON ( A.col_a = B.col_a )
GROUP BY col_a, col_b, col_c
;
You should clean up that table, though. Depending on how many occurrences of a repeated row, it might be faster to use a subquery:
SELECT *
FROM A
JOIN (select distinct * from B) AS C
ON ( A.col_a = C.col_a )
;
The quick & dirty answer is:
select DISTINCT * from A inner join B on A.col_a = B.col_a
But the real question is, why do you have two identical entries in Table B?
Usually when you have to use DISTINCT, it indicates a problem in your data model.
精彩评论