Converting aggregate operators from SQL to relational algebra
I have several SQL queries written that I want to convert to relational algebra. However, some of the queries use aggregate operators and I don't know how to convert them. Notably they use COUNT and GROUP BY.. HAVING operators.
Here is the schema:
Sailors(sid, sname, rating) Reserves(sid, bid, price) Boats(bid, bname)
Here is an example of what I'm doing: find the bids and bnames of all boats reserved by exactly 2 sailors.
SELECT B.bid, B.bname
FROM Boats B, Reserves R
WHERE B.bid = R.bid
GROUP BY R.bid
HAVING 2 = (SELECT COUNT(*)
FROM Reserve开发者_如何学编程s R2
WHERE R2.bid = B.bid);
Allowable relational algebra operations: selection, projection, join, conditional join, rename, union, intersection, cross-product, division
This is only half an answer...
The relation "boats reserved by two or more sailors" can be found using conditional join and projection, which are both in your set of allowed operations:
SELECT DISTINCT R1.bid
FROM Reserves AS R1
JOIN Reserves AS R2
ON R1.bid = R2.bid
AND R1.sid < R2.sid;
The relation "boats reserved by three or more sailors" can be found using conditional join (twice) and projection, which are both in your set of allowed operations:
SELECT DISTINCT R1.bid
FROM Reserves AS R1
JOIN Reserves AS R2
ON R1.bid = R2.bid
AND R1.sid < R2.sid
JOIN Reserves AS R3
ON R1.bid = R3.bid
AND R2.sid < R3.sid;
If we had a minus operator e.g. EXCEPT
in Standard SQL:
SELECT DISTINCT R1.bid
FROM Reserves AS R1
JOIN Reserves AS R2
ON R1.bid = R2.bid
AND R1.sid < R2.sid
EXCEPT
SELECT DISTINCT R1.bid
FROM Reserves AS R1
JOIN Reserves AS R2
ON R1.bid = R2.bid
AND R1.sid < R2.sid
JOIN Reserves AS R3
ON R1.bid = R3.bid
AND R2.sid < R3.sid;
If we had restriction (WHERE
in SQL) and a semi difference (a.k.a. antijoin) operator (e.g. NOT IN
in SQL):
SELECT DISTINCT R1.bid
FROM Reserves AS R1
JOIN Reserves AS R2
ON R1.bid = R2.bid
AND R1.sid < R2.sid
WHERE R1.bid NOT IN (
SELECT DISTINCT R1.bid
FROM Reserves AS R1
JOIN Reserves AS R2
ON R1.bid = R2.bid
AND R1.sid < R2.sid
JOIN Reserves AS R3
ON R1.bid = R3.bid
AND R2.sid < R3.sid
);
...but your set of allowed operations does not include restriction, semi difference or minus :(
"I was reading a book with a chapter on relational algebra and it didn't mention aggregate functions for them at all".
Literature on the relational algebra typically limits itself to the portion of the algebra that makes it closed over relations. An algebra is closed over a set of types (I'm probably expressing myself a bit sloppily, but the main idea is right) if none of the operators of the algebra returns a value of a type that isn't a member of that set of types the algebra is closed over.
If all you have (or want to consider in a book) is the set of all relation types, and you want to write a treatment of the algebra, then you cannot define an operator that returns an integer (COUNT), or a float (HARMONICMEAN), or a date (MIN(<date column>)), or whatever similar kind of thing, without breaking the 'closed' property of the algebra.
That is not to say that such aggregation operations are useless (of course not). They're typically just not exactly relevant in a context where the primary purpose is to explain JOIN, PROJECT, RESTRICT, etc.
EDIT
supplementary piece of answer regarding GROUP BY ... HAVING. You noticed correctly that this SQL construct is nontrivial stuff when it comes to algebra equivalents. The gist of it is that the set of algebra operators that you mention, lacks an operator that is needed to achieve such stuff, and this operator is GROUP. GROUP takes an input relation, and produces an output relation in which one of the attributes is relation-valued.
For example, GROUP ( RESERVES , SAILORS_AND_THEIR_BID ( SID , PRICE ) ) would produce a relation of degree 2, with attributes BID and SAILORS_AND_THEIR_BID. The latter attribute is relation-valued, so that the expression COUNT(SAILORS_AND_THEIR_BID) becomes valid in the context of a RESTRICT condition applied to this relation, such that you can write ( GROUP ( RESERVES , SAILORS_AND_THEIR_BID ( SID , PRICE ) ) ) WHERE COUNT(SAILORS_AND_THEIR_BID) = 2.
Building on onedaywhen's answer:
Yes, the lack of a set difference operator does hurt. It should totally be allowed. However, we can express set difference with set complement and intersection:
B - A = B ∩ A'
i.e. the difference of B and A is in fact B's intersection with A's complement. We have the intersection as an allowed operator, and while a relation's proper complement is an ugly thing, the complement of an R1 ⊆ R relative to R (i.e. the stuff in R that aren't in R1) can be found easily with a join:
SELECT DISTINCT R0.x
FROM R as R1
JOIN R as R0 ON R1.x<>R0.x
WHERE R1.x=val
is the complement relative to R of
SELECT DISTINCT R.x FROM R WHERE R.x=val
So, here comes a solution to the riddle, I think: it's easy to get all the boats reserved by two or more guys: select all the boats that are in the reserves table, take the cartesian product of the result with itself, then select each row that has different sailor1 and sailor2. In the unwieldy relational algebra notation they taught me:
π( R.bid ) (
σ( R.bid=R2.bid and R.sid<R2.sid )( R x ρ(R, R2) )
)
(where π is the projection operator, σ is the selection operator, and ρ is the rename operator)
This nets the id's of all the boats reserved by two or more people. Now I'm going to get all the boats that were reserved by two or less guys. To do this, I'll select all the boats reserved by three or more guys, and take the set's complement by selecting all the rows from the original table which aren't present in that set. It won't be pretty, but here it goes:
π(R.bid)(σ(R.bid<>R1.bid)(
π(R.bid)(R)
x
π(R1.bid) (
σ( R1.bid=R2.bid and R2.bid=R3.bid and R1.sid<R2.sid and R2.sid<R3.sid )( ρ(R, R1) x ρ(R, R2) x ρ(R, R3) )
)
))
You see, I select all the rows which have the property, then select all the rows from the original table which aren't these, netting us all the rows which don't have the property, here meaning all the boats not reserved by three or more people, the boats reserved by two or less persons.
To get the boats with exactly two guys reserving them, simply intersect this with the set of boats reserved by more than one guy.
π( R.bid ) (
σ( R.bid=R2.bid and R.sid<R2.sid )( R x ρ(R, R2) )
) ∩ π( R.bid ) (
σ(R.bid<>R1.bid)(
π(R.bid)(R)
x
π(R1.bid) (
σ( R1.bid=R2.bid and R2.bid=R3.bid and R1.sid<R2.sid and R2.sid<R3.sid )( ρ(R, R1) x ρ(R, R2) x ρ(R, R3) )
)
)
)
Ugh. It's so ugly it hurts. I wish I knew a nicer notation.
SQLishly, it might look like this, I think:
(SELECT DISTINCT R1.bid
FROM Reserves AS R1
JOIN Reserves AS R2 ON R1.bid = R2.bid AND R1.sid < R2.sid
) INTERSECT (
SELECT DISTINCT R.bid
FROM Reserves AS R1
JOIN Reserves AS R2 ON R1.bid = R2.bid AND R1.sid < R2.sid
JOIN Reserves AS R3 ON R1.bid = R3.bid AND R2.sid < R3.sid
JOIN Reserves AS R ON R.bid<>R1.bid
)
Please note that this is exactly onedaywhen's solution, except I expressed set difference as taking the intersection with the complement.
精彩评论