开发者

Select multiple rows by max

I have a simple table with a "versioning" scheme:

Version | PartKey1 | PartKey2 | Value
   1    |    0     |    0     | foo
   2    |    0     |    0     | bar
   1    |    1     |    0     | foobar

This table is medium (~100 000 lines for a full version). At the start it is loaded with a version 1 which contains a full snapshot, and over time incremental updates are added, but we want to preserve the old versions, thus they are added with an incremented "Version" number (2 here).

When reading the data, I want to be able to specify a maximum version, and I would like, if possible, to only retrieve the "rows" I am interested in.

For example: specifying 2 as the maximum version, I would like a query that retrieve only 2 rows in the table above:

Version | PartKey1 | PartKey2 | Value
   2    |    0     |    0     | bar
   1    |    1     |    0     | foobar

The row:

   1    |    0     |    0     | foo

is discarded because the version 2 of this row is more recent.

I was wondering if such a selection was possible / advisable in a SQL query. I can do the filtering on the application side, but obviously it means pulling in useless resources from the D开发者_JS百科B so if it's possible (and cheap on the DB side) I'd rather offload this work to the DB.


You can do:

SELECT v1.*
  FROM versioningscheme v1
  LEFT JOIN versioningscheme v2
    ON v2.partkey1 = v1.partkey1 AND v2.partkey2 = v1.partkey2
   AND v2.version > v1.version
 WHERE v2.version IS NULL

Left Join with NULL detection is very powerful and underused. Null values are returned when there is no match (and obviously, when you have the max row in v1, you can't get a row in v2 that satisfies the join condition).


select t.*
from MyTable t
inner join (
    select PartKey1, PartKey2, max(Version) as MaxVersion
    from MyTable
    where Version <= 2
    group by PartKey1, PartKey2
) tm on t.PartKey1 = tm.PartKey1 
    and t.PartKey2 = tm.PartKey2 
    and t.Version = tm.MaxVersion


This is common with time varying data (Where you choose to find the most recent value within a specific window of time), and is completely reasonable.

In your case, ROW_NUMBER() allows the data to be parsed just once, rather than multiple times. With an appropriate INDEX such as (PartKey1, PartKey2, Version), this should be exceptionally quick...

SELECT
  *
FROM
(
  SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY PartKey1, PartKey2 ORDER BY Version DESC) AS reversed_version
  FROM
    MyTable
  WHERE
    Version <= <MaxVersionParamter>
)
  AS data
WHERE
  reversed_version = 1
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜