Efficiently selecting from many-to-many relation in H2
I'm using H2, and I have a database of books (table Entries) and authors (table Persons), connected through a many-to-many relationship, itself stored in a table Authorship. The database is fairly large (900'000+ persons and 2.5M+ books).
I'm trying to efficiently select the list of all books authored by at least one author whose name matches a pattern (LIKE '%pattern%'). The trick here is that the pattern should severly restrict the number of matching authors, and each author has a reasonably small number of associated books.
I tried two queries:
SELECT p.*, e.title FROM (SELECT * FROM Persons WHERE name LIKE '%pattern%') AS p
INNER JOIN Authorship AS au ON au.authorId = p.id
INNER JOIN Entries AS e ON e.id = au.entryId;
and:
SELECT p.*, e.title FROM Persons AS p
INNER JOIN Authorship AS au ON au.au开发者_运维知识库thorId = p.id
INNER JOIN Entries AS e ON e.id = au.entryId
WHERE p.name like '%pattern%';
I expected the first one to be much faster, as I'm joining a much smaller (sub)table of authors, however they both take as long. So long in fact that I can manually decompose the query into three selects and find the result I want faster.
When I try to EXPLAIN the queries, I observe that indeed they are very similar (a full join on the tables and only then a WHERE clause), so my question is: how can I achieve a fast select, that relies on the fact that the filter on authors should result in a much smaller join with the other two tables?
Note that I tried the same queries with MySQL and got results in line with what I expected (selecting first is much faster).
Thank you.
OK, here is something that finally worked for me.
Instead of running the query:
SELECT p.*, e.title FROM (SELECT * FROM Persons WHERE name LIKE '%pattern%') AS p
INNER JOIN Authorship AS au ON au.authorId = p.id
INNER JOIN Entries AS e ON e.id = au.entryId;
...I ran:
SELECT title FROM Entries e WHERE id IN (
SELECT entryId FROM Authorship WHERE authorId IN (
SELECT id FROM Persons WHERE name LIKE '%pattern%'
)
)
It's not exactly the same query, because now I don't get the author id as a column in the result, but that does what I wanted: take advantage of the fact that the pattern restricts the number of authors to a very small value to search only through a small number of entries.
What is interesting is that this worked great with H2 (much, much faster than the join), but with MySQL it is terribly slow. (This has nothing to do with the LIKE '%pattern%' part, see comments in other answers.) I suppose queries are optimized differently.
SELECT * FROM Persons WHERE name LIKE '%pattern%'
will always take LONG on a 900,000+ row table no matter what you do because when your pattern '%pattern%'
starts with a %
MySql can't use any indexes and should do a full table scan. You should look into full-text indexes and function.
Well, since the like condition starts with a wildcard it will result in a full table scan which is always slow, no internal caching can take place.
If you want to do full text searches, mysql is not the best bet you have. Look into other software (solr for instance) to solve this kind of problems.
精彩评论