SQL: find entries in 1:n relation that don't comply with condition spanning multiple rows

2022-12-30 17:00 问答作者：

I'm trying to optimize SQL queries in Akonadi and came across the following problem that is apparently not easy to solve with SQL, at least for me:

Assume the following table structure (should work in SQLite, PostgreSQL, MySQL):

CREATE TABLE a (
  a_id INT PRIMARY KEY
);

INSERT INTO a (a_id) VALUES (1), (2), (3), (4);

CREATE TABLE b (
  b_id INT PRIMARY KEY,
  a_id INT,
  name VARCHAR(255) NOT NULL
);

INSERT INTO b (b_id, a_id, name)
       VALUES (1, 1, 'foo'), (2, 1, 'bar'), (3, 1, 'asdf'),
              (4, 2, 'foo'), (5, 2, 'bar'), (6, 3, 'foo');

Now my problem is to find entries in a that are missing name entries in table b. E.g. I need to make sure each entry in a has at least the name entries "foo" and "bar" in table b. Hence the query should return something similar to:

a_id = 3 is missing name "bar"
a_id = 4 is missing name "foo" and "bar"

Since both tables are potentially huge in Akonadi, performance is of utmost importance.

One solution in MySQL would be:

SELECT a.a_id,
       CONCAT('|', GROUP_CONCAT(name ORDER BY NAME ASC SEPARATOR '|'), '|') as names
  FROM a
  LEFT JOIN b 开发者_开发知识库USING( a_id )
  GROUP BY a.a_id
  HAVING names IS NULL OR names NOT LIKE '%|bar|foo|%';

I have yet to measure the performance tomorrow, but severly doubt it's any fast for tens of thousand of entries in a and thrice as many in b. Furthermore we want to support SQLite and PostgreSQL where to my knowledge the GROUP_CONCAT function is not available.

Thanks, good night.

This should work with any SQL standard RDBMS:

SELECT 
   a.a_id, 
   Foo.b_id as Foo_Id,
   Bar.b_id as Bar_Id
FROM a
LEFT OUTER JOIN (SELECT a_id, b_id FROM b WHERE name = 'foo') as Foo ON
   a.a_id = Foo.a_id
LEFT OUTER JOIN (SELECT a_id, b_id FROM b WHERE name = 'bar') as Bar ON
   a.a_id = Bar.a_id
WHERE
   Foo.a_id IS NULL
   OR Bar.a_id IS NULL

Well, you could do with some definition in the database of which are the required elements. So I'll create one:

CREATE TABLE required(name varchar(255) primary key);
INSERT INTO required VALUES('foo'), ('bar');

(this could be a temporary table or just an inline union of constants if it's dynamic)

Now the set of rows we expect to find in b is given by:

SELECT a.a_id, required.name FROM a CROSS JOIN required;

So we outer join this set against b to determine what's present and what's not:

SELECT a.a_id, required.name, b.b_id
FROM a
     CROSS JOIN required
     LEFT JOIN b ON b.a_id = a.a_id AND b.name = required.name;

or alternatively:

SELECT a.a_id, required.name
FROM a CROSS JOIN required
WHERE NOT EXISTS (SELECT 1 FROM b WHERE b.a_id = a.a_id AND b.name = required.name);

Assuming there's an index (and it seems likely from your description to be a uniqueness constraint) on b(a_id,name) that should work nicely. To some extent or another, it will scan a and cross check against b using the index.

I got a nice tip in #sql on freenode by Ari-Ugwu and Xgc: Using the CrossTab pattern:

SELECT a.a_id, SUM(name = "foo") as hasFoo, SUM(name = "bar") as hasBar, ...
  FROM a
  LEFT JOIN b USING (a_id)
  GROUP BY a.a_id
  HAVING hasFoo < 1 OR hasFoo IS NULL OR hasBar < 1 OR hasBar IS NULL...;

turns out that none of these are any faster than simply doing that stuff in the program itself... and the latter is much easier to do, hence I opted for that after all.

继续阅读：postgresql sql

SQL: find entries in 1:n relation that don't comply with condition spanning multiple rows

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？