How do you select values that are NOT a superset to a list?
I am trying to fetch values based on an allowlist and denylist. The table assigns multiple values to an id.
The allowlist requires all values to be met, so an allowlist of ['a', 'b']
would only return ids that have every item in the values column. This part of the query is done and works as e开发者_JS百科xpected.
For simplicity, I return every id where one of the values equals 'a'
/*
Table Query result
┌────┬────┐ ┌───┐
│id │val ├──►│id │
├────┼────┤ ├───┤
│1 │ a │ │1 │
│1 │ c │ │3 │
│1 │ b │ └───┘
│2 │ b │
│3 │ a │
│3 │ c │
└────┴────┘
*/
For the next step, I want to create a denylist, where the previously filtered ids must not have a corresponding value.
To accomplish this, I wish to return a row of values where each value x
meets the following properties:
x
does not occur in the allowlist- if ids with
x
were banned, the allowlist must still return at least one row. x
is distinct
However, I have trouble figuring out how to approach the denylist query. It makes little sense to return denylist options that would nullify allowlist values.
How should I approach this problem? How do I only return values that are NOT a superset to a
?
Consider the following example of what the query output should be:
/*
...WHERE val
IN ('a')...
Table Query result
┌────┬────┐ ┌────┬─────┐
│id │val ├──►│val │count│
├────┼────┤ ├────┼─────┤
│1 │ a │ │b │ 2 │
│1 │ c │ │c │ 2 │
│1 │ b │ └────┴─────┘
│2 │ b │
│3 │ a │
│3 │ c │
└────┴────┘
a is always accompanied by c. If you denylist c,
a will never return, despite being on the allowlist.
c is a pointless option for a denylist.
if b is on the denylist, id 3 is still valid. B is a
purposeful option.
How do I return this instead?
...WHERE val
IN ('a')...
Table Query result
┌────┬────┐ ┌────┬─────┐
│id │val ├───►│val │count│
├────┼────┤ ├────┼─────┤
│1 │ a │ │b │ 2 │
│1 │ c │ └────┴─────┘
│1 │ b │
│2 │ b │
│3 │ a │
│3 │ c │
└────┴────┘
*/
After some tinkering, I have found a solution.
Assuming the query is based on this structure:
SELECT
`value`,
COUNT(*) as count
FROM testTable
WHERE `id` IN (
SELECT `id` FROM testTable
WHERE `value` IN ('a')
GROUP BY `id`
HAVING COUNT(DISTINCT testTable.`value`) = 1
)
GROUP BY `value` ORDER BY count DESC;
It will return every value that shares the same id as the allowlist, along with its count. Since all values in the allowlist must occur in the result, their count will respectively be the highest:
┌─────┬─────┐
│value│count│
├─────┼─────┤
│a │ 2 │
│b │ 1 │
│c │ 2 │
└─────┴─────┘
Therefore, if a different value has the same count, it occurs in all of the returning rows. One way to obtain the maximum value is a subquery with LIMIT 1
:
SELECT
COUNT(*) as count
FROM testTable
WHERE `id` IN (
SELECT `id` FROM testTable
WHERE `value` IN ('a')
GROUP BY `id`
HAVING COUNT(DISTINCT testTable.`value`) = 1
)
GROUP BY `value`
ORDER BY count DESC
LIMIT 1;
Since I want no values that exist in all entries, I compare with <
using the HAVING
clause.
The end result would be:
SELECT
`value`,
COUNT(*) as count
FROM testTable
WHERE `id` IN (
SELECT `id` FROM testTable
WHERE `value` IN ('a')
GROUP BY `id`
HAVING COUNT(DISTINCT testTable.`value`) = 1
)
GROUP BY `value`
HAVING count < (
SELECT
COUNT(*) as count
FROM testTable
WHERE `id` IN (
SELECT `id` FROM testTable
WHERE `value` IN ('a')
GROUP BY `id`
HAVING COUNT(DISTINCT testTable.`value`) = 1
)
GROUP BY `value`
ORDER BY count DESC
LIMIT 1
)
ORDER BY count DESC;
This will return:
┌─────┬─────┐
│value│count│
├─────┼─────┤
│b │ 1 │
└─────┴─────┘
精彩评论