Non-equivalence joins
I've been doing SQL for several years, and throughout all that time, my thought of joins was that of equivalence joins, as in
select ... from t1 join t2 on t1.a = t2.b
Notice how the join is based on one or more equalities, here t1.a = t2.b
. However, recently, I don't remember where, I saw a non-equivalence join (I just made this term up, please tell me if there's a real name for it), where the join condition contains at least one non-equality, as in
select ... from t1 join t2 on t1.a > t2.b
Which can be done to do some nice things, especially with out开发者_运维百科er joins. Let me illustrate this with an example.
Let's consider a table called products, with the following data:
product year price
----------------------
apple 2009 4
apple 2008 2
apple 2007 5
apple 2006 6
apple 2005 2
banana 2009 9
banana 2008 12
banana 2007 16
banana 2006 15
banana 2005 10
And we want to do the usual "give me the most expensive year for each product", which is, as far as I know, commonly done with an inner join to the same table grouped by products, like so:
select t1.`name`, t1.`year`, t1.`price`
from products as t1 join
( select `name`, max(`price`) as `max_price` from products group by `name` ) as t2 on t1.`name`=t2.`name` and t1.`price`=t2.`max_price`
So, on t2, we're getting the maximum price for each product, and then we join this result with the same table, to get the rest of the data for that column (it gets a little tricky for tiebreaking)
However, with a non-equivalence outer join, we can do it like so:
select t1.`name`, t1.`year`, t1.`price`
from products as t1 left join products as t2 on t1.`name`=t2.`name` and t1.`price` < t2.`price`
where t2.`name` is null
This time, we're joining the same table twice, where the price on t1 is lesser than the price on t2. The trick here is that, since this is a left outer join, t2 values on the resulting join will be null when the join doesn't match, which happens for the maximum value for the price.
Both these queries yield the same result, but I'm not really sure which one performs better. The first query has an expensive grouping, while the second one must manually check all t1/t2 pairs to get a result. Tiebreaking seems to be easier with the non-equivalence join though.
So, my question is:
Are there any recommended sources (books, webpages) that discuss non-equivalence joins more in depth, explaining what you can do with them (I'm assuming you can do lots more than just getting maximum values in groups), and how they perform against other methods for doing the same things?
Edit: I know windowing functions are also available for doing things such as the trivial example I mentioned above. I'm not asking how to get the maximum value of a table. I know how to do this, and I even provided two ways to do it. I want to know what else can I do with non-equivalence joins.
And we want to do the usual "give me the most expensive year for each product", which is, as far as I know, commonly done with an inner join to the same table grouped by products, like so
A lot easier done using a windowing function (as jonealres has already mentioned)
select name,
year,
price,
max(price) over (partition by name) as most_expensive
from products
I did similar things here SQL Query Creating Start and End Dates and here SQL Server logical grouping most recent time.
This kind of joins are not different then the "equality" ones. The on clause is just a logical evaluation. In case indexes exist it is evaluated closely to a where clause on indexes.
Other then that it depends on everyones imagination. SQL is rich, so it can be combined.
精彩评论