GROUP BY does not remove duplicates

2023-03-30 12:20 问答作者：

I have a watchlist system that I've coded, in the overview of the users' watchlist, they would see a list of records, however the list shows duplicates when in the database it only shows the exact, correct number.

I've tried GROUP BY watch.watch_id, GROUP BY rec.record_id, none of any types of group I开发者_运维问答've tried seems to remove duplicates. I'm not sure what I'm doing wrong.

SELECT watch.watch_date,
       rec.street_number,
       rec.street_name,
       rec.city,
       rec.state,
       rec.country,
       usr.username
FROM
(
    watchlist watch 

    LEFT OUTER JOIN records rec ON rec.record_id = watch.record_id

    LEFT OUTER JOIN members usr ON rec.user_id = usr.user_id
)
WHERE  watch.user_id = 1 
GROUP BY watch.watch_id
LIMIT 0, 25

The watchlist table looks like this:

+----------+---------+-----------+------------+
| watch_id | user_id | record_id | watch_date |
+----------+---------+-----------+------------+
|       13 |       1 |        22 | 1314038274 |
|       14 |       1 |        25 | 1314038995 |
+----------+---------+-----------+------------+

GROUP BY does not "remove duplicates". GROUP BY allows for aggregation. If all you want is to combine duplicated rows, use SELECT DISTINCT.

If you need to combine rows that are duplicate in some columns, use GROUP BY but you need to to specify what to do with the other columns. You can either omit them (by not listing them in the SELECT clause) or aggregate them (using functions like SUM, MIN, and AVG). For example:

SELECT watch.watch_id, COUNT(rec.street_number), MAX(watch.watch_date)
... GROUP by watch.watch_id

EDIT

The OP asked for some clarification.

Consider the "view" -- all the data put together by the FROMs and JOINs and the WHEREs -- call that V. There are two things you might want to do.

First, you might have completely duplicate rows that you wish to combine:

a b c
- - -
1 2 3
1 2 3
3 4 5

Then simply use DISTINCT

SELECT DISTINCT * FROM V;

a b c
- - -
1 2 3
3 4 5

Or, you might have partially duplicate rows that you wish to combine:

a b c
- - -
1 2 3
1 2 6
3 4 5

Those first two rows are "the same" in some sense, but clearly different in another sense (in particular, they would not be combined by SELECT DISTINCT). You have to decide how to combine them. You could discard column c as unimportant:

SELECT DISTINCT a,b FROM V;

a b
- -
1 2
3 4

Or you could perform some kind of aggregation on them. You could add them up:

SELECT a,b, SUM(c) "tot" FROM V GROUP BY a,b;

a b tot
- - ---
1 2 9
3 4 5

You could add pick the smallest value:

SELECT a,b, MIN(c) "first" FROM V GROUP BY a,b;

a b first
- - -----
1 2 3
3 4 5

Or you could take the mean (AVG), the standard deviation (STD), and any of a bunch of other functions that take a bunch of values for c and combine them into one.

What isn't really an option is just doing nothing. If you just list the ungrouped columns, the DBMS will either throw an error (Oracle does that -- the right choice, imo) or pick one value more or less at random (MySQL). But as Dr. Peart said, "When you choose not to decide, you still have made a choice."

While SELECT DISTINCT may indeed work in your case, it's important to note why what you have is not working.

You're selecting fields that are outside of the GROUP BY. Although MySQL allows this, the exact rows it returns for the non-GROUP BY fields is undefined.

If you wanted to do this with a GROUP BY try something more like the following:

SELECT watch.watch_date,
       rec.street_number,
       rec.street_name,
       rec.city,
       rec.state,
       rec.country,
       usr.username
FROM
(
    watchlist watch 

    LEFT OUTER JOIN est8_records rec ON rec.record_id = watch.record_id

    LEFT OUTER JOIN est8_members usr ON rec.user_id = usr.user_id
)
WHERE watch.watch_id IN (
SELECT watch_id FROM watch WHERE user_id = 1 
GROUP BY watch.watch_id)
LIMIT 0, 25

I Would never recommend using SELECT DISTINCT, it's really slow on big datasets.

Try using things like EXISTS.

You are grouping by watch.watch_id and you have two results, which have different watch IDs, so naturally they would not be grouped.

Also, from the results displayed they have different records. That looks like a perfectly valid expected results. If you are trying to only select distinct values, then you don't want ot GROUP, but you want to select by distinct values.

SELECT DISTINCT()...

If you say your watchlist table is unique, then one (or both) of the other tables either (a) has duplicates, or (b) is not unique by the key you are using.

To suppress duplicates in your results, either use DISTINCT as @Laykes says, or try

GROUP BY watch.watch_date,
       rec.street_number,
       rec.street_name,
       rec.city,
       rec.state,
       rec.country,
       usr.username

It sort of sounds like you expect all 3 tables to be unique by their keys, though. If that is the case, you are simply masking some other problem with your SQL by trying to retrieve distinct values.

继续阅读：group-by

GROUP BY does not remove duplicates

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？