Check if one column value is contained in another column value (TSQL)?

2023-01-10 12:33 问答作者：

Hey, I have 2 tables with many columns and I want to find those rows where the value from table1.somecolumn is contained in table2.someothercolumn. Example:

table1.somecolumn has开发者_如何学运维 Smith, Peter and

table2.someothercolumn has peter.smith

That should be a match, how would I do such a search?

Thanks :)

You can try either SOUNDEX or DIFFERENCE functions to help match string literals.

Example:

select difference('peter.green', 'Green, Peter')

returns 2, whereby:

The integer returned is the number of characters in the SOUNDEX values that are the same. The return value ranges from 0 through 4: 0 indicates weak or no similarity, and 4 indicates strong similarity or the same values.

See SOUNDEX and DIFFERENCE topics on MSDN.

Update:

Soundex & Difference may not function well when the order of words are considered, but if you have full text indexing capabilities installed, you don't need to create an index to use the word breaking and parsing capabilities of the full text engine. Assuming you're using SQL Server 2008, the following function will return you a list of normalised terms:

SELECT * FROM sys.dm_fts_parser('"Peter Green"', 1033, 0, 0)

Through which you can CROSS APPLY to the remainder of your query.

See the sys.dm_fts_parser topic & Section K. Using Apply within the FROM topic for more info.

Example: (SQL Server Enterprise 2008 with Full Text Engine enabled)

if not OBJECT_ID('Names1', 'Table') is null drop table names1
if not OBJECT_ID('Names2', 'Table') is null drop table names2

create table Names1 
(
    id int identity(0, 1),
    name nvarchar(128)
)
insert into Names1 (name) values ('Green, Peter')
insert into Names1 (name) values ('Smith, Peter')
insert into Names1 (name) values ('Aadland, Beverly')
insert into Names1 (name) values ('Aalda, Mariann')
insert into Names1 (name) values ('Aaliyah')
insert into Names1 (name) values ('Aames, Angela')
insert into Names1 (name) values ('Aames, Willie')
insert into Names1 (name) values ('Aaron, Caroline')
insert into Names1 (name) values ('Aaron, Quinton')
insert into Names1 (name) values ('Aaron, Victor')
insert into Names1 (name) values ('Abbay, Peter')
insert into Names1 (name) values ('Abbott, Dorothy')
insert into Names1 (name) values ('Abbott, Bruce')
insert into Names1 (name) values ('Abbott, Bud')
insert into Names1 (name) values ('Abbott, Philip')
insert into Names1 (name) values ('Abdoo, Rose')
insert into Names1 (name) values ('Abdul, Paula')
insert into Names1 (name) values ('Abel, Jake')
insert into Names1 (name) values ('Abel, Walter')
insert into Names1 (name) values ('Abeles, Edward')
insert into Names1 (name) values ('Abell, Tim')
insert into Names1 (name) values ('Aber, Chuck')

create table Names2
(
    id int identity(200, 1),
    name nvarchar(128)
)
insert into Names2 (name) values (LOWER('Peter.Green'))
insert into Names2 (name) values (LOWER('Peter.Smith'))
insert into names2 (name) values (LOWER('Beverly.Aadland'))
insert into names2 (name) values (LOWER('Mariann.Aalda'))
insert into names2 (name) values (LOWER('Aaliyah'))
insert into names2 (name) values (LOWER('Angela.Aames'))
insert into names2 (name) values (LOWER('Willie.Aames'))
insert into names2 (name) values (LOWER('Caroline.Aaron'))
insert into names2 (name) values (LOWER('Quinton.Aaron'))
insert into names2 (name) values (LOWER('Victor.Aaron'))
insert into names2 (name) values (LOWER('Peter.Abbay'))
insert into names2 (name) values (LOWER('Dorothy.Abbott'))
insert into names2 (name) values (LOWER('Bruce.Abbott'))
insert into names2 (name) values (LOWER('Bud.Abbott'))
insert into names2 (name) values (LOWER('Philip.Abbott'))
insert into names2 (name) values (LOWER('Rose.Abdoo'))
insert into names2 (name) values (LOWER('Paula.Abdul'))
insert into names2 (name) values (LOWER('Jake.Abel'))
insert into names2 (name) values (LOWER('Walter.Abel'))
insert into names2 (name) values (LOWER('Edward.Abeles'))
insert into names2 (name) values (LOWER('Tim.Abell'))
insert into names2 (name) values (LOWER('Chuck.Aber'));

with ftsNamesFirst (id, term) as
(
    select id, terms.display_term
        from names1 cross apply sys.dm_fts_parser('"' + name + '"', 1033, 0, 0) terms
), ftsNamesSecond (id, term) as
(
select id, terms.display_term
        from names2 cross apply sys.dm_fts_parser('"' + name + '"', 1033, 0, 0) terms
)
select * from 
(
    select 
    ROW_NUMBER() over (partition by nfirst.id order by sum(DIFFERENCE(ftsNamesFirst.term, ftsNamesSecond.term)) desc) ranking,
    sum(DIFFERENCE(ftsNamesFirst.term, ftsNamesSecond.term)) Confidence,
    nFirst.id Names1ID,
    nFirst.name Names1Name, 
    nSecond.id Names2ID,
    nSecond.name Names2Name
    from 
    ftsNamesFirst cross join ftsNamesSecond 
    left outer join names1 nFirst on nFirst.id = ftsNamesFirst.id
    left outer join names2 nSecond on nSecond.id = ftsNamesSecond.id 
    where DIFFERENCE(ftsNamesFirst.term, ftsNamesSecond.term) = 4
    group by 
        nFirst.id, nFirst.name, nSecond.id, nSecond.name
) MatchedNames 
where ranking = 1

Outputs:

Where the matches with highest confidence take precedence (all others are filtered out using a windowed ranking query).

Confidence Names1ID Names1Name Names2ID Names2Name
8   0   Green, Peter    200 peter.green
8   1   Smith, Peter    201 peter.smith
8   2   Aadland, Beverly    202 beverly.aadland
8   3   Aalda, Mariann  203 mariann.aalda
4   4   Aaliyah 204 aaliyah
8   5   Aames, Angela   205 angela.aames
8   6   Aames, Willie   206 willie.aames

It's not perfect, but this is a nice starting point from where it can be tweaked to give a higher probability of success.

There are several possible solutions depending on exactly what you need: Use could create auxiliary table to store keyword for every record

Use auxiliary table to store keywords for every record or record and field. E.g. table_helper(id int primary key, record_id int, keyword varchar), record_id - link to source table. Fill this table in trigger for table1, table2. Query to find common rows is a simple intersection of table_helper with itself. You could create one helper for table1 and table2 or use separate tables.
Use full-text indexes.

继续阅读：substring tsql

Check if one column value is contained in another column value (TSQL)?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？