Non-equivalence joins

2023-02-19 04:21 问答作者：

I've been doing SQL for several years, and throughout all that time, my thought of joins was that of equivalence joins, as in

select ... from t1 join t2 on t1.a = t2.b

Notice how the join is based on one or more equalities, here t1.a = t2.b. However, recently, I don't remember where, I saw a non-equivalence join (I just made this term up, please tell me if there's a real name for it), where the join condition contains at least one non-equality, as in

select ... from t1 join t2 on t1.a > t2.b

Which can be done to do some nice things, especially with out开发者_运维百科er joins. Let me illustrate this with an example.

Let's consider a table called products, with the following data:

product   year  price
----------------------
apple   2009    4
apple   2008    2
apple   2007    5
apple   2006    6
apple   2005    2
banana  2009    9
banana  2008    12
banana  2007    16
banana  2006    15
banana  2005    10

And we want to do the usual "give me the most expensive year for each product", which is, as far as I know, commonly done with an inner join to the same table grouped by products, like so:

select t1.`name`, t1.`year`, t1.`price`
from products as t1 join
( select `name`, max(`price`) as `max_price` from products group by `name` ) as t2 on t1.`name`=t2.`name` and t1.`price`=t2.`max_price`

So, on t2, we're getting the maximum price for each product, and then we join this result with the same table, to get the rest of the data for that column (it gets a little tricky for tiebreaking)

However, with a non-equivalence outer join, we can do it like so:

select t1.`name`, t1.`year`, t1.`price`
from products as t1 left join products as t2 on t1.`name`=t2.`name` and t1.`price` < t2.`price`
where t2.`name` is null

This time, we're joining the same table twice, where the price on t1 is lesser than the price on t2. The trick here is that, since this is a left outer join, t2 values on the resulting join will be null when the join doesn't match, which happens for the maximum value for the price.

Both these queries yield the same result, but I'm not really sure which one performs better. The first query has an expensive grouping, while the second one must manually check all t1/t2 pairs to get a result. Tiebreaking seems to be easier with the non-equivalence join though.

So, my question is:

Are there any recommended sources (books, webpages) that discuss non-equivalence joins more in depth, explaining what you can do with them (I'm assuming you can do lots more than just getting maximum values in groups), and how they perform against other methods for doing the same things?

Edit: I know windowing functions are also available for doing things such as the trivial example I mentioned above. I'm not asking how to get the maximum value of a table. I know how to do this, and I even provided two ways to do it. I want to know what else can I do with non-equivalence joins.

And we want to do the usual "give me the most expensive year for each product", which is, as far as I know, commonly done with an inner join to the same table grouped by products, like so

A lot easier done using a windowing function (as jonealres has already mentioned)

select name, 
       year, 
       price,
       max(price) over (partition by name) as most_expensive
from products

I did similar things here SQL Query Creating Start and End Dates and here SQL Server logical grouping most recent time.

This kind of joins are not different then the "equality" ones. The on clause is just a logical evaluation. In case indexes exist it is evaluated closely to a where clause on indexes.

Other then that it depends on everyones imagination. SQL is rich, so it can be combined.

继续阅读：oracle postgresql sql sql-server

Non-equivalence joins

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？