Optimizing a SQL Query for a Many to One Relationship

2023-04-06 19:06 问答作者：

I've got two tables with a many to one relationship which I'll call Parent_Table and Child_Table (i.e. a parent has zero or more children, but children have exactly one parent). I need to count the number of parents who have at least one child that fulfills some condition. Which query is optimal?

Option 1 (pretty sure it's not this one)

SELECT COUNT(DISTINCT(pt.ID)) 
FROM PARENT_TABLE pt
JOIN CHILD_TABLE ct
ON pt.ID =  ct.PARENT_ID
WHERE <parent meets some condition>
AND <child meets some condition>

Option 2

SELECT COUNT(pt.ID)
FROM PARENT_TABLE pt
WHERE pt.ID in
(
SELECT ct.PARENT_ID
FROM CHILD_TABLE ct
WHERE <child meets condition>
)
AND <parent meets some condition>

Option 3 (my guess as the fastest)

SELECT COUNT(pt.ID)
FROM PARENT_TABLE pt
WHERE EXISTS
(
SELECT 1
FROM CHILD_TABLE ct
WHERE ct.PARENT_ID = pt.ID
AND <child meets condition>
)
AND <parent meets some condition>

Or is it something else entirely? Does it depend on the sizes of each table, or the complexity of the two co开发者_开发技巧nditions, or whether the data is sorted?

EDIT: Database is Oracle.

The first query is slow, the others should run fast on most DB's.

Without knowing the DB it's hard to say more:

But: count(*) is often faster than count(names_field) and never slower
count(distinct (afield)) is slow

Or is it something else entirely?

That depends on the DB and the exact version of the DB.

Does it depend on the sizes of each table

Yes, that plays a big part

or the complexity of the two conditions

Possible

or whether the data is sorted?

If you want a fast select, all fields used to join must be indexed.
And all fields used in a where clause must either be indexed or low-cardinality.

For me the first one seems the best since it's the easiest to read, but that obviously doesn't answer your question.

What you really have to do is generate execution plans for each of the queries and analyze them (I think most of the popular DBMS have a tool to do that). It will give you a cost value for each query.

If you can't do that I guess you could run the queries a bunch of times and compare the execution time.

Or is it something else entirely? Does it depend on the sizes of each table, or the complexity of the two conditions, or whether the data is sorted?

All of that and more.

Like the commenters say, the best way to answer this question is to run the queries and measure.

However, in general, database engines optimize joins very, very efficiently - I'm pretty sure you will find almost no difference between the 3 queries, and it's entirely possible the query optimizers will turn them all into the same basic query (2 and 3 are equivalent as it is).

By far the biggest impact on the query will be the "child meet some condition" and "parent meets some condition" clauses. I'd concentrate on optimizing this bit.

继续阅读：optimization oracle sql sql-optimization

Optimizing a SQL Query for a Many to One Relationship

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？