Select multiple rows by max
I have a simple table with a "versioning" scheme:
Version | PartKey1 | PartKey2 | Value
1 | 0 | 0 | foo
2 | 0 | 0 | bar
1 | 1 | 0 | foobar
This table is medium (~100 000 lines for a full version). At the start it is loaded with a version 1 which contains a full snapshot, and over time incremental updates are added, but we want to preserve the old versions, thus they are added with an incremented "Version" number (2 here).
When reading the data, I want to be able to specify a maximum version, and I would like, if possible, to only retrieve the "rows" I am interested in.
For example: specifying 2 as the maximum version, I would like a query that retrieve only 2 rows in the table above:
Version | PartKey1 | PartKey2 | Value
2 | 0 | 0 | bar
1 | 1 | 0 | foobar
The row:
1 | 0 | 0 | foo
is discarded because the version 2
of this row is more recent.
I was wondering if such a selection was possible / advisable in a SQL query. I can do the filtering on the application side, but obviously it means pulling in useless resources from the D开发者_JS百科B so if it's possible (and cheap on the DB side) I'd rather offload this work to the DB.
You can do:
SELECT v1.*
FROM versioningscheme v1
LEFT JOIN versioningscheme v2
ON v2.partkey1 = v1.partkey1 AND v2.partkey2 = v1.partkey2
AND v2.version > v1.version
WHERE v2.version IS NULL
Left Join with NULL detection is very powerful and underused. Null values are returned when there is no match (and obviously, when you have the max row in v1, you can't get a row in v2 that satisfies the join condition).
select t.*
from MyTable t
inner join (
select PartKey1, PartKey2, max(Version) as MaxVersion
from MyTable
where Version <= 2
group by PartKey1, PartKey2
) tm on t.PartKey1 = tm.PartKey1
and t.PartKey2 = tm.PartKey2
and t.Version = tm.MaxVersion
This is common with time varying data (Where you choose to find the most recent value within a specific window of time), and is completely reasonable.
In your case, ROW_NUMBER() allows the data to be parsed just once, rather than multiple times. With an appropriate INDEX such as (PartKey1, PartKey2, Version)
, this should be exceptionally quick...
SELECT
*
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY PartKey1, PartKey2 ORDER BY Version DESC) AS reversed_version
FROM
MyTable
WHERE
Version <= <MaxVersionParamter>
)
AS data
WHERE
reversed_version = 1
精彩评论