'in' and 'not in' counts do not add up - what's wrong?
I have some servers. Some of them have ips assigned. I want to figure out how many do not. There are clearly more servers than have ips assigned, but my db tells me there are no servers that have no ips assigned...
I'm at my wit's end here. Is my DB corrupted in some strange way?
SELECT COUNT(*)
FROM server
...returns:
+----------+
| count(*) |
+----------+
| 23088 |
+----------+
1 row in set (0.00 sec)
This:
SELECT COUNT(*)
FROM server
WHERE server_id IN (SELECT DISTINCT(server_id)
FROM ips)
...returns:
+----------+
| count(*) |
+----------+
| 13811 |
+----------+
1 row in set (0.01 sec)
This:
SELECT COUNT(*)
FROM server
WHERE server_id NOT IN (SELECT DISTINCT(server_id)
FROM ips);
...returns:
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.02 sec)
Resul开发者_高级运维ts have been edited to protect the guilty, but you get the idea.
- All tables are InnoDB.
Check table
returns ok on both of these tables.
EDIT: thank you for the suggestion of using LEFT JOIN
. This definitely confirms that the problem is the MySQL bug.
mysql> SELECT count(s.server_id) FROM server s LEFT JOIN ips i on s.server_id = i.server_id WHERE i.server_id IS NULL;
+--------------------+
| count(s.server_id) |
+--------------------+
| 9277 |
+--------------------+
1 row in set (0.04 sec)
Since 9277 + 13811 = 23088, this means that all servers without ips + all servers with ips does indeed == all servers.
I've scheduled an upgrade of the mysql server for beginning of next week. Stay tuned.
What version of MySQL? There seems to be a bug in < 5.0.25 / 5.1.12 that might be the culprit:
Bug #21282: NOT IN, more than 1000 returns incorrect results with INDEX:
Using a
SELECT ... WHERE some_field NOT IN (...)
and then 1000 or more values in the NOT IN part causes the server to return incorrect results if there is an INDEX/UNIQUE key on some_field. Less than 1000 criteria works correctly.
Do you have any Nulls in your columns?
The server_id not in (ids)
does not match NULL
columns, so you only get the servers with a non-NULL server_id
that isn't among those in ips
. You'll want to use where server_id is null
instead.
Assuming the bug truppo found causes this, you could use this workaround:
select count(*)
from server s
left join ips i on i.server_id = s.server_id
where i.server_id is null
Above, i.server_id is null
is true if the left join
did not find a match (just like all columns from i
would yield null
for that situation).
Do you have any record that has a NULL for server_id ? Because it would be excluded in both case.
if you have NULL in your columns, they will evaluate to false in both cases. the result you are getting is in
+ not in
- nulls
select count(*)
from server
where server_id not in (select distinct(server_id) from ips)
or server_id is NULL
I would assume that there is something strange going on with the IN and NOT IN. Might be a bug or a "known limitation".
I'd suggest to first try to answer your initial question (servers without an ip) and then have a look at the data .. maybe that gives you an indication on what might be going on.
So here are some alternative ideas to give you what you are looking for:
SELECT server_id
FROM server
MINUS
SELECT server_id
FROM ips
Or
SELECT server_id
FROM server s LEFT JOIN ips i on s.server_id = i.server_id
WHERE i.server_id is null
As said above, this may give you an idea on why the data is not "caught" by your original statements.
精彩评论