开发者

Non-equivalence joins

I've been doing SQL for several years, and throughout all that time, my thought of joins was that of equivalence joins, as in

select ... from t1 join t2 on t1.a = t2.b

Notice how the join is based on one or more equalities, here t1.a = t2.b. However, recently, I don't remember where, I saw a non-equivalence join (I just made this term up, please tell me if there's a real name for it), where the join condition contains at least one non-equality, as in

select ... from t1 join t2 on t1.a > t2.b

Which can be done to do some nice things, especially with out开发者_运维百科er joins. Let me illustrate this with an example.

Let's consider a table called products, with the following data:

product   year  price
----------------------
apple   2009    4
apple   2008    2
apple   2007    5
apple   2006    6
apple   2005    2
banana  2009    9
banana  2008    12
banana  2007    16
banana  2006    15
banana  2005    10

And we want to do the usual "give me the most expensive year for each product", which is, as far as I know, commonly done with an inner join to the same table grouped by products, like so:

select t1.`name`, t1.`year`, t1.`price`
from products as t1 join
( select `name`, max(`price`) as `max_price` from products group by `name` ) as t2 on t1.`name`=t2.`name` and t1.`price`=t2.`max_price`

So, on t2, we're getting the maximum price for each product, and then we join this result with the same table, to get the rest of the data for that column (it gets a little tricky for tiebreaking)

However, with a non-equivalence outer join, we can do it like so:

select t1.`name`, t1.`year`, t1.`price`
from products as t1 left join products as t2 on t1.`name`=t2.`name` and t1.`price` < t2.`price`
where t2.`name` is null

This time, we're joining the same table twice, where the price on t1 is lesser than the price on t2. The trick here is that, since this is a left outer join, t2 values on the resulting join will be null when the join doesn't match, which happens for the maximum value for the price.

Both these queries yield the same result, but I'm not really sure which one performs better. The first query has an expensive grouping, while the second one must manually check all t1/t2 pairs to get a result. Tiebreaking seems to be easier with the non-equivalence join though.

So, my question is:

Are there any recommended sources (books, webpages) that discuss non-equivalence joins more in depth, explaining what you can do with them (I'm assuming you can do lots more than just getting maximum values in groups), and how they perform against other methods for doing the same things?

Edit: I know windowing functions are also available for doing things such as the trivial example I mentioned above. I'm not asking how to get the maximum value of a table. I know how to do this, and I even provided two ways to do it. I want to know what else can I do with non-equivalence joins.


And we want to do the usual "give me the most expensive year for each product", which is, as far as I know, commonly done with an inner join to the same table grouped by products, like so

A lot easier done using a windowing function (as jonealres has already mentioned)

select name, 
       year, 
       price,
       max(price) over (partition by name) as most_expensive
from products


I did similar things here SQL Query Creating Start and End Dates and here SQL Server logical grouping most recent time.

This kind of joins are not different then the "equality" ones. The on clause is just a logical evaluation. In case indexes exist it is evaluated closely to a where clause on indexes.

Other then that it depends on everyones imagination. SQL is rich, so it can be combined.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜