SQL: find entries in 1:n relation that don't comply with condition spanning multiple rows
I'm trying to optimize SQL queries in Akonadi and came across the following problem that is apparently not easy to solve with SQL, at least for me:
Assume the following table structure (should work in SQLite, PostgreSQL, MySQL):
CREATE TABLE a (
a_id INT PRIMARY KEY
);
INSERT INTO a (a_id) VALUES (1), (2), (3), (4);
CREATE TABLE b (
b_id INT PRIMARY KEY,
a_id INT,
name VARCHAR(255) NOT NULL
);
INSERT INTO b (b_id, a_id, name)
VALUES (1, 1, 'foo'), (2, 1, 'bar'), (3, 1, 'asdf'),
(4, 2, 'foo'), (5, 2, 'bar'), (6, 3, 'foo');
Now my problem is to find entries in a
that are missing name
entries in table b
. E.g. I need to make sure each entry in a
has at least the name
entries "foo"
and "bar"
in table b
. Hence the query should return something similar to:
a_id = 3 is missing name "bar"
a_id = 4 is missing name "foo" and "bar"
Since both tables are potentially huge in Akonadi, performance is of utmost importance.
One solution in MySQL would be:
SELECT a.a_id,
CONCAT('|', GROUP_CONCAT(name ORDER BY NAME ASC SEPARATOR '|'), '|') as names
FROM a
LEFT JOIN b 开发者_开发知识库USING( a_id )
GROUP BY a.a_id
HAVING names IS NULL OR names NOT LIKE '%|bar|foo|%';
I have yet to measure the performance tomorrow, but severly doubt it's any fast for tens of thousand of entries in a
and thrice as many in b
. Furthermore we want to support SQLite and PostgreSQL where to my knowledge the GROUP_CONCAT
function is not available.
Thanks, good night.
This should work with any SQL standard RDBMS:
SELECT
a.a_id,
Foo.b_id as Foo_Id,
Bar.b_id as Bar_Id
FROM a
LEFT OUTER JOIN (SELECT a_id, b_id FROM b WHERE name = 'foo') as Foo ON
a.a_id = Foo.a_id
LEFT OUTER JOIN (SELECT a_id, b_id FROM b WHERE name = 'bar') as Bar ON
a.a_id = Bar.a_id
WHERE
Foo.a_id IS NULL
OR Bar.a_id IS NULL
Well, you could do with some definition in the database of which are the required elements. So I'll create one:
CREATE TABLE required(name varchar(255) primary key);
INSERT INTO required VALUES('foo'), ('bar');
(this could be a temporary table or just an inline union of constants if it's dynamic)
Now the set of rows we expect to find in b is given by:
SELECT a.a_id, required.name FROM a CROSS JOIN required;
So we outer join this set against b to determine what's present and what's not:
SELECT a.a_id, required.name, b.b_id
FROM a
CROSS JOIN required
LEFT JOIN b ON b.a_id = a.a_id AND b.name = required.name;
or alternatively:
SELECT a.a_id, required.name
FROM a CROSS JOIN required
WHERE NOT EXISTS (SELECT 1 FROM b WHERE b.a_id = a.a_id AND b.name = required.name);
Assuming there's an index (and it seems likely from your description to be a uniqueness constraint) on b(a_id,name) that should work nicely. To some extent or another, it will scan a and cross check against b using the index.
I got a nice tip in #sql on freenode by Ari-Ugwu and Xgc: Using the CrossTab pattern:
SELECT a.a_id, SUM(name = "foo") as hasFoo, SUM(name = "bar") as hasBar, ...
FROM a
LEFT JOIN b USING (a_id)
GROUP BY a.a_id
HAVING hasFoo < 1 OR hasFoo IS NULL OR hasBar < 1 OR hasBar IS NULL...;
turns out that none of these are any faster than simply doing that stuff in the program itself... and the latter is much easier to do, hence I opted for that after all.
精彩评论