Identify a specific sequence of records in a table

2022-12-13 00:20 问答作者：

Assume a table with the fields TransactionId, ItemId, Code, EffectiveDate, and CreateDate.

+---------------+--------+------+------------------+------------------+
| TransactionId | ItemId | Code |   EffectiveDate  |     CreateDate   |
+---------------+--------+------+------------------+------------------+
|              1|       1|     8| 12/2/2009 1:13 PM| 12/2/2009 1:13 PM|
+---------------+--------+------+------------------+------------------+
|              4|       1|    51|12/2/2009 11:08 AM| 12/3/2009 9:01 AM|
+---------------+--------+------+------------------+------------------+
|              2|       1|    14|12/2/2009 11:09 AM|12/2/2009 11:09 AM|
+---------------+--------+------+------------------+------------------+
|              3|       1|    61| 12/3/2009 8:33 AM| 12/3/2009 8:33 AM|
+---------------+--------+------+------------------+------------------+
|              5|       1|    28| 12/3/2009 9:33 AM| 12/3/2009 9:33 AM|
+---------------+--------+------+------------------+------------------+
|              6|       1|     9| 12/3/2009 1:58 PM| 12/3/2009 1:58 PM|
+---------------+--------+------+------------------+------------------+

I need to get the set of records where the sequence 51, 61, 9 occurs for a given ItemId, sorted by EffectiveDate. There could be other records with other codes in between these records.

In this case, I would return TransactionId's 4, 3, and 6, as shown below.

+---------------+--------+------+------------------+------------------+
| TransactionId | ItemId | Code |   EffectiveDate  |     CreateDate   |
+---------------+--------+------+------------------+------------------+
|              4|       1|    51|12/2/2009 11:08 AM| 12/3/2009 9:01 AM|
+---------------+--------+------+------------------+------------------+
|              3|       1|    61| 12/3/2009 8:33 AM| 12/3/2009 8:33 AM|
+---------------+--------+------+------------------+------------------+
|              6|       1|     9| 12/3/2009 1:58 PM| 12/3/2009 1:58 PM|
+---------------+--------+------+------------------+------------------+

Note that:

This isn't the only sequence I'll need to identify, but it illustrates the problem.
Records can be inserted into the table out of order; that is, the 61 record could have开发者_StackOverflow社区 been inserted first, followed by the 51, and then the 9. You can see this in the example where for the code 51 record the CreateDate is later than the EffectiveDate.
The order of the sequence matters. So, the sequence 61, 9, 51 would not return any records, but 51, 61, 9 would.

A DB approach is ideal if it's simple (i.e. no cursors or overly complicated stored procedure), but a code approach could also work, although it would result in a significant amount of data transfer out of the DB.

The environment is SQL Server 2005 and C#/.NET 3.5.

Actually, you could get a couple of fairly simple solutions leveraging ranking/windowing functions and/or CTEs and recursive CTEs.

Create a procedure that accepts a character-based comma-separated list of Code values you are looking for in the sequence you want them in - use any of a dozen possible ways to split this list into a table/set that is made up of the sequence and Code value, resulting in a table with a structure like this:

declare @sequence table (sequence int not null, Code int not null);

Once you have this, it's simply a matter of sequencing the source set based on joining the sequenced table to the source table on the same Code values for a given ItemId - once you have the source set filtered and sequenced, you can simply join again based on the matching sequence values - this is sounding complex, but in reality it would be a single query like this:

with srcData as (
    select  row_number() over(order by t.EffectiveDate) as rn,
            t.TransactionId, t.ItemId, t.Code, t.EffectiveDate, t.CreateDate
    from    #TableName t
    join    @sequence s
    on      t.Code = s.Code
    where   t.ItemId = @item_id
)
select  d.TransactionId, d.ItemId, d.Code, d.EffectiveDate, d.CreateDate
from    srcData d
join    @sequence s
on      d.rn = s.sequence
and     d.Code = s.Code
order by d.rn;

This alone won't guarantee that you get a result-set that is identical to what you are looking for, but staging the data into a temp table and adding a few simple checks around the code would do the trick (for example, add a checksum validation and a sum of the code values)

declare @tempData table (rn int, TransactionId smallint, ItemId smallint, Code smallint, EffectiveDate datetime, CreateDate datetime);

with srcData as (
    select  row_number() over(order by t.EffectiveDate) as rn,
            t.TransactionId, t.ItemId, t.Code, t.EffectiveDate, t.CreateDate
    from    #TableName t
    join    @sequence s
    on      t.Code = s.Code
    where   t.ItemId = @item_id
)
insert  @tempData
        (rn, TransactionId, ItemId, Code, EffectiveDate, CreateDate)
select  d.rn, d.TransactionId, d.ItemId, d.Code, d.EffectiveDate, d.CreateDate
from    srcData d
join    @sequence s
on      d.rn = s.sequence
and     d.Code = s.Code;

-- Verify we have matching hash/sums    
if
(
    ( (select sum(Code) from @sequence) = (select sum(Code) from @tempData) )
    and
    ( (select checksum_agg(checksum(sequence, Code)) from @sequence) = (select checksum_agg(checksum(rn, Code)) from @tempData) )
)
begin;
    -- Match - return the resultset
    select  d.TransactionId, d.ItemId, d.Code, d.EffectiveDate, d.CreateDate
    from    @tempData d
    order by d.rn;

end;

If you want to do it all inline, you could use a different approach leveraging CTEs and recursion to perform a running sum/total and OrdPath-like comparison as well (though you'd still need to parse the sequence character data out into a dataset)

-- Sequence data with running total
with sequenceWithRunningTotal as
(
    -- Anchor
    select  s.sequence, s.Code, s.Code as runningTotal, cast(s.Code as varchar(8000)) as pth,
            sum(s.Code) over(partition by 1) as sumCode
    from    @sequence s
    where   s.sequence = 1
    -- Recurse
    union all
    select  s.sequence, s.Code, b.runningTotal + s.Code as runningTotal,
            b.pth + '.' + cast(s.Code as varchar(8000)) as pth,
            b.sumCode as sumCode
    from    @sequence s
    join    sequenceWithRunningTotal b
    on      s.sequence = b.sequence + 1
),
-- Source data with sequence value
srcData as 
(
    select  row_number() over(order by t.EffectiveDate) as rn,
            t.TransactionId, t.ItemId, t.Code, t.EffectiveDate, t.CreateDate,
            sum(t.Code) over(partition by 1) as sumCode
    from    #TableName t
    join    @sequence s
    on      t.Code = s.Code
    where   t.ItemId = @item_id
),
-- Source data with running sum
sourceWithRunningSum as
(
    -- Anchor
    select  t.rn, t.TransactionId, t.ItemId, t.Code, t.EffectiveDate, t.CreateDate,
            t.Code as runningTotal, cast(t.Code as varchar(8000)) as pth,
            t.sumCode
    from    srcData t
    where   t.rn = 1
    -- Recurse
    union all
    select  t.rn, t.TransactionId, t.ItemId, t.Code, t.EffectiveDate, t.CreateDate,
            s.runningTotal + t.Code as runningTotal,
            s.pth + '.' + cast(t.Code as varchar(8000)) as pth,
            t.sumCode
    from    srcData t
    join    sourceWithRunningSum s
    on      t.rn  = s.rn + 1
)
select  d.TransactionId, d.ItemId, d.Code, d.EffectiveDate, d.CreateDate
from    sourceWithRunningSum d
join    sequenceWithRunningTotal s
on      d.rn = s.sequence
and     d.Code = s.Code
and     d.runningTotal = s.runningTotal
and     d.pth = s.pth
and     d.sumCode = s.sumCode
order by d.rn;

A DB approach is ideal if it's simple (i.e. no cursors or overly complicated stored procedure)

I don't believe a pure DB approach ("pure" meaning only using SQL SELECT) is practical because the type of SQL I envision would require very convoluted self-joins, field concatenation, MAX() functions, etc. This type of SQL might be a fun academic answer to a puzzle in Joe Celko's "SQL for Smarties" book but I don't think that's appropriate for production code.

I think the realistic approach is to write some kind of loop that keeps track of state. Your problem in the general sense is very similar to writing code for stateful inspection of TCPIP packets for spam filtering or scanning credit-card transactions for fraudulent patterns. All these problems share similar characteristics: the actions you take on the current row(record) depends on what records you saw previously (the context)... and that aspect requires holding state variables.

If you want to avoid round-tripping the data for analysis, it looks like Transact-SQL is the best way for performance. Or use hosted CLR to take advantage of C# syntax while still keeping the processing within the database engine.

This is just off the top of my head and is untested, so it may need some tweaking:

SELECT DISTINCT
     T.TransactionID,
     T.ItemID,
     T.Code,
     T.EffectiveDate,
     T.CreateDate
FROM
     My_Table T
INNER JOIN (
     SELECT
          T1.TransactionID,
          T2.TransactionID,
          T3.TransactionID
     FROM
          My_Table T1
     INNER JOIN My_Table T2 ON
          T2.ItemID = T1.ItemID AND
          T2.Code = 61 AND
          T2.EffectiveDate > T1.EffectiveDate
     INNER JOIN My_Table T3 ON
          T3.ItemID = T1.ItemID AND
          T3.Code = 9 AND
          T3.EffectiveDate > T2.EffectiveDate
     WHERE
          T1.Code = 51
     ) SQ ON
     SQ.TransactionID = T1.TransactionID OR
     SQ.TransactionID = T2.TransactionID OR
     SQ.TransactionID = T3.TransactionID

继续阅读：sql sql-server sql-server-2005

Identify a specific sequence of records in a table

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？