开发者

How do I utilize Row_Number() (partitioning) for my datapool correctly

we have following table (output is al开发者_开发技巧ready ordered and separated for understanding):

| PK | FK1 | FK2 |   ActionCode |         CreationTS  | SomeAttributeValue |
+----+-----+-----+--------------+---------------------+--------------------+
|  6 | 100 | 500 |       Create | 2011-01-02 00:00:00 |                  H |
----------------------------------------------------------------------------
|  3 | 100 | 500 |       Change | 2011-01-01 02:00:00 |                  Z |
|  2 | 100 | 500 |       Change | 2011-01-01 01:00:00 |                  X |
|  1 | 100 | 500 |       Create | 2011-01-01 00:00:00 |                  Y |
----------------------------------------------------------------------------
|  4 | 100 | 510 |       Create | 2011-01-01 00:30:00 |                  T |
----------------------------------------------------------------------------
|  5 | 100 | 520 | CreateSystem | 2011-01-01 00:30:00 |                  A |
----------------------------------------------------------------------------

what is ActionCode? we use this in c# and there it represents an enum-value

what do i want to achieve?

well, i need the following output:

| FK1 | FK2 |   ActionCode | SomeAttributeValue |
+-----+-----+--------------+--------------------+
| 100 | 500 |       Create |                  H |
| 100 | 500 |       Create |                  Z |
| 100 | 510 |       Create |                  T |
| 100 | 520 | CreateSystem |                  A |
-------------------------------------------------

well, what is the actual logic? we have some logical groups for composite-key (FK1 + FK2). each of these groups can be broken into partitions, which begin with Create or CreateSystem. each partition ends with Create, CreateSystem or Change. The actual value of SomeAttributeValue for each partition should be the value from the last line of the partition.

it is not possible to have following datapool:

| PK | FK1 | FK2 |   ActionCode |         CreationTS  | SomeAttributeValue |
+----+-----+-----+--------------+---------------------+--------------------+
|  7 | 100 | 500 |       Change | 2011-01-02 02:00:00 |                  Z |
|  6 | 100 | 500 |       Create | 2011-01-02 00:00:00 |                  H |
|  2 | 100 | 500 |       Change | 2011-01-01 01:00:00 |                  X |
|  1 | 100 | 500 |       Create | 2011-01-01 00:00:00 |                  Y |
----------------------------------------------------------------------------

and then expect PK 7 to affect PK 2 or PK 6 to affect PK 1.

i don't even know how/where to start ... how can i achieve this? we are running on mssql 2005+

EDIT:

there's a dump available:

  • instanceId: my PK
  • tenantId: FK 1
  • campaignId: FK 2
  • callId: FK 3
  • refillCounter: FK 4
  • ticketType: ActionCode (1 & 4 & 6 are Create, 5 is Change, 3 must be ignored)
  • ticketType, profileId, contactPersonId, ownerId, handlingStartTime, handlingEndTime, memo, callWasPreselected, creatorId, creationTS, changerId, changeTS should be taken from the Create (first line in partition in groups)
  • callingState, reasonId, followUpDate, callingAttempts and callingAttemptsConsecutivelyNotReached should be taken from the last Create (which then would be a "one-line-partition-in-group" / the same as the upper one) or Change (last line in partition in groups)


I'm assuming that each partition can only contain a single Create or CreateSystem, otherwise your requirements are ill-defined. The following is untested, since I don't have a sample table, nor sample data in an easily consumed format:

;With Partitions as (
     Select
         t1.FK1,
         t1.FK2,
         t1.CreationTS as StartTS,
         t2.CreationTS as EndTS
     From
         Table t1
             left join
         Table t2
             on
                  t1.FK1 = t2.FK1 and
                  t1.FK2 = t2.FK2 and
                  t1.CreationTS < t2.CreationTS and
                  t2.ActionCode in ('Create','CreateSystem')
             left join
         Table t3
             on
                  t1.FK1 = t3.FK1 and
                  t1.FK2 = t3.FK2 and
                  t1.CreationTS < t3.CreationTS and
                  t3.CreationTS < t2.CreationTS and
                  t3.ActionCode in ('Create','CreateSystem')
       where
           t1.ActionCode in ('Create','CreateSystem') and
           t3.FK1 is null
), PartitionRows as (
     SELECT
         t1.FK1,
         t1.FK2,
         t1.ActionCode,
         t2.SomeAttributeValue,
         ROW_NUMBER() OVER (PARTITION_FRAGMENT_ID BY t1.FK1,T1.FK2,t1.StartTS ORDER BY t2.CreationTS desc) as rn
     from
         Partitions t1
             inner join
         Table t2
             on
                t1.FK1 = t2.FK1 and
                t1.FK2 = t2.FK2 and
                t1.StartTS <= t2.CreationTS and
                (t2.CreationTS < t1.EndTS or t1.EndTS is null)
)
select * from PartitionRows where rn = 1

(Please note than I'm using all kinds of reserved names here)

The basic logic is: The Partitions CTE is used to define each partition in terms of the FK1, FK2, an inclusive start timestamp, and exclusive end timestamp. It does this by a triple join to the base table. the rows from t2 are selected to occur after the rows from t1, then the rows from t3 are selected to occur between the matching rows from t1 and t2. Then, in the WHERE clause, we exclude any rows from the result set where a match occurred from t3 - the result being that the row from t1 and the row from t2 represent the start of two adjacent partitions.

The second CTE then retrieves all rows from Table for each partition, but assigning a ROW_NUMBER() score within each partition, based on the CreationTS, sorted descending, with the result that ROW_NUMBER() 1 within each partition is the last row to occur.

Finally, within the select, we choose those rows that occur last within their respective partitions.

This does all assume that CreationTS values are distinct within each partition. I may be able to re-work it using PK also, if that assumption doesn't hold up.


It is solvable with a recursive CTE. Here (assuming rows within partitions are ordered by CreationTS):

WITH partitioned AS (
  SELECT
    *,
    rn = ROW_NUMBER() OVER (PARTITION BY FK1, FK2 ORDER BY CreationTS)
  FROM data
),
subgroups AS (
  SELECT
    PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue, rn,
    Subgroup = 1,
    Subrank  = 1
  FROM partitioned
  WHERE rn = 1
  UNION ALL
  SELECT
    p.PK, p.FK1, p.FK2, p.ActionCode, p.CreationTS, p.SomeAttributeValue, p.rn,
    Subgroup = s.Subgroup + CASE p.ActionCode WHEN 'Change' THEN 0 ELSE 1 END,
    Subrank  = CASE p.ActionCode WHEN 'Change' THEN s.Subrank ELSE 0 END + 1
  FROM partitioned p
    INNER JOIN subgroups s ON p.FK1 = s.FK1 AND p.FK2 = s.FK2
      AND p.rn = s.rn + 1
),
finalranks AS (
  SELECT
    PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue, rn,
    Subgroup, Subrank,
    rank = ROW_NUMBER() OVER (PARTITION BY FK1, FK2, Subgroup ORDER BY Subrank DESC)
    /* or: rank = MAX(Subrank) OVER (PARTITION BY FK1, FK2, Subgroup) - Subrank + 1 */
  FROM subgroups
)
SELECT PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue
FROM finalranks
WHERE rank = 1
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜