MySQL - Oddities when doing a count, that I can't explain
Doing a simple query I'm finding two very different results, and I'm not sure how I'm coming to this conclusion. Below is the example, please let me know if its something I'm just over looking..
SELECT b.fkid as t,
b.timestamp,
count(b.fkid) as hits,
count(distinct(b.fkid)) as num,
DATE_FORMAT( b.timestamp, '%Y-%m-%d' ) AS val1
FROM a, b
WHERE a.id = b.fkid
group by val1
...result:
2 2009-09-25 08:33:42 **27** 3 2009-09-25
...compared to:
SELECT b.fkid as t,
b.timestamp,
count(b.fkid) as hits,
count(distinct(b.fkid)) as num,
DATE_FORMAT( b.timestamp, '%Y-%m-%d' ) AS val1
FROM a, b
WHERE a.id = b.fkid
group by t
..result:
2 2009-09-25 08:33:42 **39** 1 2009-09-25
3 2009-09-25 08:36:59 **6** 1 2009-09-25
10 2009-09-25 22:40:14 **4** 1 2009-09-25
I don't understand how 39+6+4 = 27? I am expecting the first value to be 49 not 27. Also tried:
SELECT b.fkid as t,
b.timestamp,
count(b.fkid) as hits,
count((b.fkid)) as num,
DATE_FORMAT( b.timestamp, '%Y-%m-%d' ) AS val1
FROM a, b
WHERE a.id = b.fkid
group by val1
...which produces:
2 2009-09-25 08:33:42 27 27 2009-09-25
Following recommendation from below I开发者_运维知识库 tried eliminating the irrelevant data and made the query:
SELECT count(b.fkid) as hits,
count(distinct(b.fkid)) as num,
DATE_FORMAT( b.timestamp, '%Y-%m-%d' ) AS val1
FROM a, b
WHERE a.id = b.fkid
group by val1
...this produced:
27 3 2009-09-25
I've tried to simplify this down to:
SELECT count(b.fkid) as hits,
count(distinct(b.fkid)) as num
FROM a, b
WHERE a.id = b.fkid
group DATE_FORMAT( b.timestamp, '%Y-%m-%d' )
...this produced:
27 3
and:
SELECT count(b.fkid) as hits,
count(distinct(b.fkid)) as num
FROM a, b
WHERE a.id = b.fkid
group b.fkid
...this produced:
39 1
6 1
4 1
In SQL you should group by all the fields that are not aggregates, not just val1.
MySQL lets you get away with not doing so(most other databases will throw an error) but it can leave you with unpredictable behavior like this , especially if val1 does not disinctly identify the aggregates you are producing.
You're counting distinct fkid values. Can there be duplicates of that value? That could change your numbers, I think.
In the first query your grouping by val1 which is the same for all the records, so you only see the first value of non aggregate functions.
In the second query you are seeing 3 results -- the number of different fkid's
hope this helps.
By formatting the date for val1, you are taking records with different timestamps (but the same calendar date) and grouping them. AKA the timestamp for "2009-12-01 01:00:00" and "2009-12-01 02:00:00" formatted with the DATE_FORMAT you specify both return "2009-12-01". BUT by not grouping on that date format in the second query, the records are staying separate.
精彩评论