开发者

How to store and search for a sequence in a RDBMS?

I need to store some sequences/lists in the database (SQL Server 2008), and then find the ID for a particular sequence from the DB if any exists.

For example, I have two sequences:

Sequence 1: A,B,C Sequence 2: A,C,M,N

Currently they are stored in the following table. (I am OK with changing the table if it makes things easier.)

seq_id   token   order
1        A       0
1        B       1
1        C       2
2        A       0
2        C       1
2        M       2
2        N       3

I'd like to write a query to return the id of a given sequence, e.g. "A,B,C", if there is an exact match. The length of the se开发者_开发百科quence is unknown beforehand. Thank you!


What you need is called relational division (see Celko). The best solution will depend on your rdb engine. If you are able to do so - the most popular solution would be:

  1. Express you query as a table (a table of A,B,C)
  2. Inner join your table to the existing table, group by seq_id, count the elements of groups
  3. Use count to filter out sequences that are not exact (ie. when looking for A,B,C the count must be 3)

Let's say you have a #query table holding tokens and sorts you wish to find (I use sort instead of order to avoid conflicts with reserved keywords)

create table #query
(
token nvarchar(1)
,sort int
)

insert into #query select 'A',0
insert into #query select 'B',1
insert into #query select 'C',2
go

select  seq_id
from    dbo.sequences s
inner join  #query q
    on  q.token = s.token
        and q.sort = s.sort
group by s.seq_id
having count(*) = (select count(*) from #query)

Will return seq_id(s) that match your query. In newer versions of MsSql one would use a table variable instead of #query but the technique can be applied universally.


What you need is an order-specific signature for each sequence. Using SQL Server 2008, you can use the For Xml Path construct to assemble a signature for each sequence and the criteria and then compare one to the other. Obviously, this will not be fast. You can improve the speed tremendously by storing the signature into the appropriate table for each sequence at the time they are saved or their membership is changed (could also use a trigger). In addition, I simply used the raw signature here. However, normally, I'd be inclined to use a hash of the assembled value for the signature using the Hashbytes function.

Declare @TestInputs Table   (
                            seq_id int not null
                            , token char(1) not null
                            , [order] int not null
                            )
Insert @TestInputs( seq_id, token, [order] )
Values (1,'A',0)
    , (1,'B',1)
    , (1,'C',2)
    , (2,'A',0)
    , (2,'C',1)
    , (2,'M',2)
    , (2,'N',3);

Declare @Criteria Table (
                        token char(1) not null
                        , [order] int not null
                        )   
Insert @Criteria( token, [order] )
Values ('A',0)
    , ('B',1)
    , ('C',2);  

With Criteria As
    (
    Select  (
            Select '|' + Token
            From @Criteria
            Order By [order]
            For Xml Path('')
            ) As Signature
    )
    , InputSignatures As
    (
    Select T.seq_id
        ,   (
            Select '|' + T1.Token
            From @TestInputs As T1
            Where T1.seq_id = T.seq_id
            Order By T1.[order]
            For Xml Path('')
            ) As Signature
    From @TestInputs As T
    Group By T.seq_id
    )
Select I.*
From InputSignatures As I
    Join Criteria As C  
        On C.Signature = I.Signature


Why not to store it as is:

seq_id   token
1        A,B,C       
2        A,C,M,N

Query becomes trivial

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜