Select parent row only if it has no children
I have a MySQL database in which table A has a one-to-many relation to table B, and I would like to select all rows in table B that have no children in table A. I have tried using
SELECT id FROM A WHERE NOT EXISTS (SELECT * FROM B WHERE B.id=A.id)
and
SELECT id FROM A LEFT JOIN B ON A.id=B.id WHERE B.id IS NULL
Both of these seem slow. Is there a faster query to achieve the same thing?
In case this is relevant, in my database table A has about 500,000 rows and table B has about 3 to 4 million rows.
Edit: For the actual tables in my database, explain gives me:
+----+--------------------+------------------+-------+---------------+---------------------------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------------+-------+---------------+---------------------------+---------+------+---------+--------------------------+
| 1 | PRIMARY | frontend_form471 | index | NULL | frontend_form471_61a633e8 | 32 | NULL | 671927 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | SchoolData | index | PRIMARY | PRIMARY | 49 | NULL | 3121110 | Using where; Using index |
+----+--------------------+------------------+-------+---------------+---------------------------+---------+------+---------+--------------------------+
for
select number from frontend_form471 where not exists (select * from SchoolData where SchoolData.`f471 Application Number`=frontend_form471.number)
and
+----+-------------+------------------+-------+---------------+---------------------------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+---------------+---------------------------+---------+------+---------+------------------------------------------------+
| 1 | SIMPLE | frontend_form471 | index | NULL | frontend_form471_61a633e8 | 32 | NULL | 671927 | Using index; Using temporary |
| 1 | SIMPLE | SchoolData | index | PRIMARY | PRIMARY | 49 | NULL | 3121110 | Using where; Using index; Not exists; Distinct |
+----+-------------+------------------+-------+---------------+---------------------------+---------+------+---------+------------------------------------------------+
for
select distinct number from frontend_form471 left join SchoolData on frontend_form471.number=SchoolData.`f471 Application Number` where SchoolData.`f471 Application Number` is NULL
where in my case frontend_form471 is table A and SchoolData is table B
Edit2: In table B (SchoolData) in my database, id is the first part of a two part primary key, so it is indexed and ther开发者_C百科e are still multiple entries in B with the same id.
SELECT id FROM A LEFT OUTER JOIN B ON A.id=B.id WHERE B.id IS NULL
you can do this. the outer join should bring a little performance, but not much.
new database systems will probably optimize your query anyway so that there wont be any difference.
the correct way here is caching! try the query cacher and application level caching if possible.
of course you need proper indexes.
and by proper i mean on both tables and preferably a hash index as it will have static lookup time in comparision to any tree that has logarithmic
Try putting an explain before the query to see what really slows this down.
if you really need this to be fast you may re-facture your data structure.
you could possibly create a trigger to mark a flag in table A whether there is a corresponding entry in table be. of course this id data redundancy, but sometimes its worth it. just think of it as caching.
one last thought: you could try SELECT id FROM A WHERE id NOT IN (SELECT id FROM B)
it may be a little faster because no actual joining is necessary, however it may also be slower because the lookup in the set of be will be a full scan. I am not really sure how this will be processed but it may be worth a try.
It's going to be slow no matter how you look at it. Worst case performance is going to be a full cross join creating 2 trillion potential matches (4 mill * 500k).
The second one will most likely perform faster, since it's a single query.
Your indexing is poor.
For all forms (EXISTS, IN, LEFT JOIN) you should have indexes on id in both tables
You could try
SELECT id FROM A WHERE A.id NOT IN (SELECT id FROM B)
but i don't know if this will be any faster. I would have tried the left join first. I think your problem is more to do with indexes. Do you have indexes on both id fields.
Be sure to have an index on A.id
and another one on B.id
.
What seems a like bit strange is that you join A.id with B.id. Is B.id the foreign key to A or is it the primary key of B?
If your schema is something like this:
CREATE TABLE b(
id int,
value varchar(255)
)
CREATE TABLE a(
id int,
father_id int,
value varchar(255)
)
If you want all the rows of table A that don't have child in table A why you don't try something like:
SELECT * FROM B WHERE id NOT IN (SELECT father_id FROM A GROUP BY father_id)
I haven't tested but i think that it's fester. Remember to put an index over id
Hope this helps
Why not try empty value instead of NULL. In SQL, the NULL value is never true in comparison to any other value, even NULL. An expression that contains NULL always produces a NULL value unless otherwise indicated in the documentation for the operators and functions involved in the expression.
精彩评论