Inconsistent COUNT with DISTINCT
I have partitioned a table by the field daynumber
. During the day I write into this table logs and then calculate some stats. The table is big; each day I have ~3M new rows. The field myField
is indexed.
This query
SELECT COUNT(DISTINCT myField) FROM mytable WHERE daynumber=somevalue;
returns 0, which is a mistake.
This query
SELECT COUNT(*) FROM (SELECT DISTINCT(myField) FROM mytable WHERE daynumber=somevalue) t;
returns the correct value.
For some daynumber
values, the first query works fine. I tried to repeir that partition but that had no effect. Any suggestions?
update
table scheme looks like
CREATE TABLE `mytable` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`daynumber` INT(10) UNSIGNED NOT NULL,
`myField` VARCHAR(1024) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
... other fields
PRIMARY KEY (`daynumber`,`id`),
KEY `myField` (`myField`(20))
) ENGINE=MYISAM DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC
PARTITION BY LIST (daynumber)
(PARTITION day_810 VALUES IN (810) ENGINE = MyISAM,
PARTITION d开发者_开发知识库ay_811 VALUES IN (811) ENGINE = MyISAM,
PARTITION day_812 VALUES IN (812) ENGINE = MyISAM
....)
this is because your myfield
contains NULL
value
SELECT COUNT(DISTINCT coalesce(myField, '') )
FROM mytable WHERE daynumber=somevalue;
coalesce
is sort of convert NULL
to ''
might not what your ask for, but will return the correct count (still)
Lets' say
- You match 4 rows for daynumber=somevalue
- MyField for these 3 rows are
1, 2, 2, NULL
So, with the sub query there are 2 steps.
SELECT DISTINCT(myField)
gives 3 rows:1, 2, NULL
SELECT COUNT(*)
of this gives 3.
Without the subquery, one step
SELECT COUNT(DISTINCT(myField))
counts only1, 2
= 2
Reason:
COUNT(*)
includes NULLsCOUNT(anythingelse)
doesn't count NULLs
The queries can't be inconsistent because they are different queries
The best discussion on it I can recall is on DBA.SE: What is the difference between select count(*) and select count(any_non_null_col)?
精彩评论