SQL to join the first record in a custom sort order
Sometimes I want to join a record that I can easily identify as "sort by this and that and pick the first record". I wonder what the best practice here.
For example in my data model I have Services and Implementations. A Service may have multiple Implementations. Implementations have version numbers; zero or more implementations may be installed and one of them may also be enabled. Now I want to select the "current implementation" for each service using the following rules:
- If nothing is installed, then the current implementation is the most recent one.
- If more than one is installed, then the current is either the one that is enabled or, if nothing is enabled, the most recent one again.
Some examples:
Service Version Installed Enabled
======= ======= ========= =======
A 1 False False
A 开发者_如何学JAVA 2 False False <- current for service A because most recent
B 1 True False <- current for service B because installed
B 2 False False
C 1 True False
C 2 True False <- current for service C because most recent
C 3 False False among installed
D 1 True True <- current for service D because enabled
D 2 True False
I figured that if I sort by these fields and pick the first record, it will do the trick. I.e. I do something like that:
SELECT s.service, <other service fields>, i.version, <other impl. fields>
FROM service s, implementation i
WHERE i.rowid == (
SELECT rowid
FROM implementation o
WHERE o.service == s.service
ORDER BY installed DESC, enabled DESC, version ASC)
Update. This one is rather specific to SQLite, I guess. First, rowid
is the internal record ID that is used by SQLite. Second, in SQLite the SELECT
expression in this context returns a single value. I guess it could be rewritten into a more generic SQL like this:
...
FROM service s, implementation i
WHERE i.service == s.service
AND i.version == (
SELECT version
FROM ...
ORDER BY ...
LIMIT 1)
It works here, but sometimes there's no rowid
or another single field that can serve as an identifier. Are there any other ways to get the same result? Something more generic, perhaps? (And for me something more specific is OK too, as long as it's SQLite-specific :)
You can use a NOT EXISTS condition to exclude all but the desired match. Join rows of [service] and [implementation] that match on the [service] column so long as there does not exist another row of [implementation] that also matches on the [service] column but is preferred because it would rank higher in the desired ordering. Here is the idea.
SELECT s.service, <other service fields>, i.version, <other impl. fields>
FROM service s JOIN implementation i
ON i.service = s.service
AND NOT EXISTS ( -- where there's no "better" row from i to use
SELECT * FROM implementation AS i2
WHERE i2.service = i.service
AND (
i2.installed > i.installed
OR (i2.installed = i.installed AND i2.enabled > i.enabled)
OR (i2.installed = i.installed AND i2.enabled = i.enabled AND i2.version < i.version)
)
)
Using Microsoft SQL Server or another SQL dialect that supports the CROSS APPLY operator, this is much, much simpler:
SELECT s.service, <other service fields>, i.version, <other impl. fields>
FROM service s
CROSS APPLY (
SELECT TOP 1 * FROM implementation
WHERE implementation.service = s.service
ORDER BY installed DESC, enabled DESC, version ASC
) AS i
(Neither solution has been tested with sample data, as CREATE TABLE and INSERT statements were not posted.)
Steve Kass
I couldn't think of anything better than this:
SELECT
i.Service,
i.Installed,
i.Enabled,
MAX(i.Version) AS Version
FROM implementation i
INNER JOIN (
SELECT
i.Service,
i.Installed,
MAX(i.Enabled) AS Enabled
FROM implementation i
INNER JOIN (
SELECT
Service,
MAX(Installed) AS Installed
FROM implementation
GROUP BY
Service
) m ON i.Service = m.Service
GROUP BY
i.Service,
i.Installed
) m ON i.Service = m.Service AND i.Installed = m.Installed
GROUP BY
i.Service,
i.Installed
i.Enabled
Basically this script finds maximum values of Installed
for every Service
, then joins the obtained list back to the original table (to filter on the found values) and finds maximums of Enabled
, then again joins the obtained table back to implementation
to finally retrieve maximum values of Version
for the filtered rows.
精彩评论