开发者

How do I avoid repeating this subquery for the IN clause?

I have an SQL script (currently running against SQLite, but it should probably work against any DB engine) that uses the same subquery twice, and since it might be fetching a lot of records (the table has a couple of million rows) I'd like to only call it once.

A shortened pseudo-version of the query looks like this:

SELECT * FROM
    ([the subquery, returns a column of ids]) AS sq
[a couple of joins, that fetches things from other tables based on the ids]
WHERE thisorthat NOT IN ([the subquery again])

I tried just using the name (sq) in various ways (with/without parenthesis, with/without na开发者_如何学Goming the column of sq etc) but to no avail.

Do I really have to repeat this subquery?

Clarification: I am doing this in python and sqlite as a small demo of what can be done, but I would like my solution to scale as well as possible with as little modification as possible. In the real situation, the database will have a couple of million rows, but in my example there is just 10 rows with dummy data. Thus, code that would be well optimized on for example MySQL is absolutely good enough - it doesn't have to be optimized specifically for SQLite. But as I said, the less modification needed, the better.


There is a WITH clause in standard SQL, however, I don't know if it is supported by SQLlite - though of course worth a try:

WITH mySubQuery AS
(
  [the subquery code]
)

SELECT * FROM
    mySubQuery AS sq
    [a couple of joins, that fetches things from other tables based on the ids]
WHERE thisorthat NOT IN (mySubQuery)

That said, what you do here will likely be horribly slow for any data set that is more than a few thousand rows, so I'd try to remodel it if possible - NOT IN should be avoided in general, especially if you also have a couple of joins.


Do you need a subquery? You could probably rewrite using an OUTER JOIN e.g. something like:

SELECT * 
  FROM [the subquery's FROM clause] AS sq
       RIGHT OUTER JOIN [a couple of tables based on the ids]
          ON thisorthat = sq.[a column of ids]
 WHERE sq.[a column of ids] IS NULL;


In general, I question the need to eliminate the duplication. The SQL compiler can see that two subqueries are identical and chose to only do them once if that seems optimal.

In addition, by leaving duplicates in the source, the SQL compiler and optimizer is given the opportunity to treat them differently. For example the subquery flattening optimization of SQLite might be applied to one of a pair of duplicates or applied differently to each. See section 9.0, Subquery flattening of https://www.sqlite.org/optoverview.html.


you can put the SELECT part into a View than you can filter the View results using the alias "sq"

I hope it's helpful

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜