Queries within queries: Is there a better way?

2023-02-03 14:34 问答作者：

As I build bigger, more advanced web applications, I'm finding myself writing extremely long and complex queries. I tend to write queries within queries a lot because I feel making one call to the database from PHP is better than making several and correlating the data.

However, anyone who kn开发者_运维百科ows anything about SQL knows about JOINs. Personally, I've used a JOIN or two before, but quickly stopped when I discovered using subqueries because it felt easier and quicker for me to write and maintain.

Commonly, I'll do subqueries that may contain one or more subqueries from relative tables.

Consider this example:

SELECT 
  (SELECT username FROM users WHERE records.user_id = user_id) AS username,
  (SELECT last_name||', '||first_name FROM users WHERE records.user_id = user_id) AS name,
  in_timestamp,
  out_timestamp
FROM records
ORDER BY in_timestamp

Rarely, I'll do subqueries after the WHERE clause.

Consider this example:

SELECT
  user_id,
  (SELECT name FROM organizations WHERE (SELECT organization FROM locations WHERE records.location = location_id) = organization_id) AS organization_name
FROM records
ORDER BY in_timestamp

In these two cases, would I see any sort of improvement if I decided to rewrite the queries using a JOIN?

As more of a blanket question, what are the advantages/disadvantages of using subqueries or a JOIN? Is one way more correct or accepted than the other?

In simple cases, the query optimiser should be able to produce identical plans for a simple join versus a simple sub-select.

But in general (and where appropriate), you should favour joins over sub-selects.

Plus, you should avoid correlated subqueries (a query in which the inner expression refer to the outer), as they are effectively a for loop within a for loop). In most cases a correlated subquery can be written as a join.

JOINs are preferable to separate [sub]queries.
If the subselect (AKA subquery) is not correlated to the outer query, it's very likely the optimizer will scan the table(s) in the subselect once because the value isn't likely to change. When you have correlation, like in the example provided, the likelihood of single pass optimization becomes very unlikely. In the past, it's been believed that correlated subqueries execute, RBAR -- Row By Agonizing Row. With a JOIN, the same result can be achieved while ensuring a single pass over the table.

This is a proper re-write of the query provided:

   SELECT u.username,
          u.last_name||', '|| u.first_name AS name,
          r.in_timestamp,
          r.out_timestamp
     FROM RECORDS r 
LEFT JOIN USERS u ON u.user_id = r.user_id
 ORDER BY r.in_timestamp

...because the subselect can return NULL if the user_id doesn't exist in the USERS table. Otherwise, you could use an INNER JOIN:

  SELECT u.username,
         u.last_name ||', '|| u.first_name AS name,
         r.in_timestamp,
         r.out_timestamp
    FROM RECORDS r 
    JOIN USERS u ON u.user_id = r.user_id
ORDER BY r.in_timestamp

Derived tables/inline views are also possible using JOIN syntax.

a) I'd start by pointing out that the two are not necessarily interchangable. Nesting as you have requires there to be 0 or 1 matching value otherwise you will get an error. A join puts no such requirement and may exclude the record or introduce more depending on your data and type of join.

b) In terms of performance, you will need to check the query plans but your nested examples are unlikely to be more efficient than a table join. Typically sub-queries are executed once per row but that very much depends on your database, unique constraints, foriegn keys, not null etc. Maybe the DB can rewrite more efficiently but joins can use a wider variety of techniques, drive the data from different tables etc because they do different things (though you may not observe any difference in your output depending on your data).

c) Most DB aware programmers I know would look at your nested queries and rewrite using joins, subject to the data being suitably 'clean'.

d) Regarding "correctness" - I would favour joins backed up with proper constraints on your data where necessary (e.g. a unique user ID). You as a human may make certain assumptions but the DB engine cannot unless you tell it. The more it knows, the better job it (and you) can do.

Joins in most cases will be much more faster.

Lets take this with an example.

Lets use your first query:

SELECT 
(SELECT username FROM users WHERE records.user_id = user_id) AS username,
  (SELECT last_name||', '||first_name FROM users WHERE records.user_id = user_id) AS name,
  in_timestamp,
  out_timestamp
FROM records
ORDER BY in_timestamp

Now consider we have 100 records in records and 100 records in user.(Assuming we dont have index on user_id)

So if we understand your algorithm it says: For each record Scan all 100 records in users to find out username Scan all 100 records in users to find out last name and first name

So its like we scanned users table 100*100*2 time. Is it really worth. If we consider index on user_id it will make thing better, but is it still worth.

Now consider a join (nested loop will almost produce same result as above, but consider a hash join): Its like. Make a hash map of user. For each record Find a mapping record in Hashmap. Which will be certainly much more faster then looping and finding a record.

So clearly, joins should be favorable.

NOTE: Example used of 100 record may produce identical plan, but the idea is to analyze how it can effect the performance.

继续阅读：join postgresql query-optimization sql subquery

Queries within queries: Is there a better way?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？