How to store and search for a sequence in a RDBMS?
I need to store some sequences/lists in the database (SQL Server 2008), and then find the ID for a particular sequence from the DB if any exists.
For example, I have two sequences:
Sequence 1: A,B,C Sequence 2: A,C,M,N
Currently they are stored in the following table. (I am OK with changing the table if it makes things easier.)
seq_id token order
1 A 0
1 B 1
1 C 2
2 A 0
2 C 1
2 M 2
2 N 3
I'd like to write a query to return the id of a given sequence, e.g. "A,B,C", if there is an exact match. The length of the se开发者_开发百科quence is unknown beforehand. Thank you!
What you need is called relational division (see Celko). The best solution will depend on your rdb engine. If you are able to do so - the most popular solution would be:
- Express you query as a table (a table of A,B,C)
- Inner join your table to the existing table, group by seq_id, count the elements of groups
- Use count to filter out sequences that are not exact (ie. when looking for A,B,C the count must be 3)
Let's say you have a #query table holding tokens and sorts you wish to find (I use sort instead of order to avoid conflicts with reserved keywords)
create table #query
(
token nvarchar(1)
,sort int
)
insert into #query select 'A',0
insert into #query select 'B',1
insert into #query select 'C',2
go
select seq_id
from dbo.sequences s
inner join #query q
on q.token = s.token
and q.sort = s.sort
group by s.seq_id
having count(*) = (select count(*) from #query)
Will return seq_id(s) that match your query. In newer versions of MsSql one would use a table variable instead of #query but the technique can be applied universally.
What you need is an order-specific signature for each sequence. Using SQL Server 2008, you can use the For Xml Path construct to assemble a signature for each sequence and the criteria and then compare one to the other. Obviously, this will not be fast. You can improve the speed tremendously by storing the signature into the appropriate table for each sequence at the time they are saved or their membership is changed (could also use a trigger). In addition, I simply used the raw signature here. However, normally, I'd be inclined to use a hash of the assembled value for the signature using the Hashbytes function.
Declare @TestInputs Table (
seq_id int not null
, token char(1) not null
, [order] int not null
)
Insert @TestInputs( seq_id, token, [order] )
Values (1,'A',0)
, (1,'B',1)
, (1,'C',2)
, (2,'A',0)
, (2,'C',1)
, (2,'M',2)
, (2,'N',3);
Declare @Criteria Table (
token char(1) not null
, [order] int not null
)
Insert @Criteria( token, [order] )
Values ('A',0)
, ('B',1)
, ('C',2);
With Criteria As
(
Select (
Select '|' + Token
From @Criteria
Order By [order]
For Xml Path('')
) As Signature
)
, InputSignatures As
(
Select T.seq_id
, (
Select '|' + T1.Token
From @TestInputs As T1
Where T1.seq_id = T.seq_id
Order By T1.[order]
For Xml Path('')
) As Signature
From @TestInputs As T
Group By T.seq_id
)
Select I.*
From InputSignatures As I
Join Criteria As C
On C.Signature = I.Signature
Why not to store it as is:
seq_id token
1 A,B,C
2 A,C,M,N
Query becomes trivial
精彩评论