SQL: tuple comparison
In my current application, I need to be able to do this type of query:
SELECT MIN((colA, colB, colC))
FROM mytable
WHERE (colA, colB, colC) BETWEEN (200, 'B', 'C') AND (1000, 'E', 'F')
and get the answer of (333, 'B', 'B')
, given this data:
+------+------+------+
| colA | colB | colC |
+------+------+------+
| 99 | A | A |
| 200 | A | Z |
| 200 | B | B |
| 333 | B | B |
| 333 | C | D |
| 333 | C | E |
| 333 | D | C |
| 1000 | E | G |
| 1000 | F | A |
+------+------+------+
What is the most ef开发者_JAVA百科ficient way to accomplish this in real SQL? Please keep in mind that this is a toy example, and that my actual application has tables with varying columns and data types, and hundreds of million of rows. I use MySQL, if that helps. You can also assume that these columns have a PRIMARY or UNIQUE index on them.
If the solution is easily extensible to more/less columns, that's even better.
Tuple Comparison:
Several have asked so I should put this in the question. Tuples are ordered lexicographically, meaning that the sequences are ordered the same as their first differing elements. For example, (1,2,x) < (1,2,y) returns the same as x < y.
It's worth noting that SQL (or at least mysql) implements this correctly:
mysql> select (200, 'B', 'C') < (333, 'B', 'B') and (333, 'B', 'B') < (1000, 'E', 'F');
+--------------------------------------------------------------------------+
| (200, 'B', 'C') < (333, 'B', 'B') and (333, 'B', 'B') < (1000, 'E', 'F') |
+--------------------------------------------------------------------------+
| 1 |
+--------------------------------------------------------------------------+
1 row in set (0.00 sec)
Here's the necessary SQL to create the example:
create table mytable select 333 colA, 'B' colB, 'B' colC;
insert into mytable values (200, 'B', 'B'), (333, 'C', 'D'), (1000, 'E', 'G'),
(200, 'A', 'Z'), (1000, 'F', 'A'), (333, 'C', 'E'), (333, 'D', 'C'),
(99, 'A', 'A');
alter table mytable add unique index myindex (colA, colB, colC);
Adding this index seems to cause the table to be sorted lexicographically, which is interesting. This isn't true in our production system.
Just do:
SELECT colA
, colB
, colC
FROM mytable
WHERE ( ('A', 'B', 'C') <= (colA, colB, colC ) )
AND ( (colA, colB, colC) <= ('D', 'E', 'F' ) )
ORDER BY colA, colB, colC
LIMIT 1
;
It works just fine. And I suspect is should be pretty fast, too.
This is equivalent but it may have better performance, depending on your tables:
SELECT m.colA
, m.colB
, m.colC
FROM mytable m
WHERE ( ('A', 'B', 'C') <= (m.colA, m.colB, m.colC) )
AND ( (m.colA, m.colB, m.colC) <= ('D', 'E', 'F') )
AND NOT EXISTS
( SELECT 1
FROM mytable b
WHERE (b.colA, b.colB, b.colC) < (m. colA, m.colB, m.colC)
AND ( ('A', 'B', 'C') <= (b.colA, b.colB, b.colC) )
);
---EDIT---: (Previous wrong trials removed)
2nd try (not really relational algebra).
This works but only when the fields are char(1):
SELECT colA, colB, colC
FROM mytable
WHERE CONCAT(colA, colB, colC)
BETWEEN CONCAT('A', 'B', 'C')
AND CONCAT('D', 'E', 'F')
ORDER BY colA, colB, colC
LIMIT 1 ;
I thought that a view that shows all combinations of tuples from mytable
that are less than or equal to tuples of the same table might be helpful, as it can be used for other comparisons:
CREATE VIEW lessORequal AS
( SELECT a.colA AS smallA
, a.colB AS smallB
, a.colC AS smallC
, b.colA AS largeA
, b.colB AS largeB
, b.colC AS largeC
FROM mytable a
JOIN mytable b
ON (a.colA < b.colA)
OR ( (a.colA = b.colA)
AND ( (a.colB < b.colB)
OR (a.colB = b.colB
AND a.colC <= b.colC)
)
)
) ;
Using similar technique, this solves the question. It works with any kind of fields (int, float, char of any length). It's going to be kind of awkard and complicated though if one tries to add more fields.
SELECT colA, colB, colC
FROM mytable m
WHERE ( ('A' < colA)
OR ( ('A' = colA)
AND ( ('B' < colB)
OR ('B' = colB
AND 'C' <= colC)
)
)
)
AND ( (colA < 'D')
OR ( (colA = 'D')
AND ( (colB < 'E')
OR (colB = 'E'
AND colC <= 'F')
)
)
)
ORDER BY colA, colB, colC
LIMIT 1 ;
One also define a function:
CREATE FUNCTION IslessORequalThan( lowA CHAR(1)
, lowB CHAR(1)
, lowC CHAR(1)
, highA CHAR(1)
, highB CHAR(1)
, highC CHAR(1)
)
RETURNS boolean
RETURN ( (lowA < highA)
OR ( (lowA = highA)
AND ( (lowB < highB)
OR ( (lowB = highB)
AND (lowC <= highC)
)
)
)
);
and use it to solve the same or similar problems. This solves the question again. The query is elegant but a new function has to be created if the type or number of fields is changed.
SELECT colA
, colB
, colC
FROM mytable
WHERE IslessORequalThan( 'A', 'B', 'C', colA, colB, colC )
AND IslessORequalThan( colA, colB, colC, 'D', 'E', 'F' )
ORDER BY colA, colB, colC
LIMIT 1;
Until then, and because the condition
(colA, colB, colC) BETWEEN ('A', 'B', 'C') AND ('D', 'E', 'F')
was not allowed in MySQL, I thought that
('A', 'B', 'C') <= (colA, colB, colC)
was not allowed as well. But I was wrong.
精彩评论