SQL Server - Multi-Column substring matching

2022-12-22 23:56 问答作者：

One of my clients is hooked on multi-column substring matching.

I understand that Contains and FreeText search for words (and at least in the case of Contains, word prefixes). However, based upon my understanding of this MSDN book, neither of these nor their variants are capable of searching substrings.

I have used LIKE rather extensively (Select * from A where A.B Like '%substr%')

Sample table A:

ID | Col1     | Col2     | Col3     |
-------------------------------------
1  | oklahoma | colorado | Utah     |
2  | arkansas | colorado | oklahoma |
3  | florida  | michigan | florida  |
-------------------------------------

The following code will give us row 1 and row 2:

 select * from A where Col1 like '%klah%' or Col2 like '%klah%' or Col3 like '%klah%'

This is rather ugly, probably slow, and I just don't like it very much. Probably because the implementations that I'm dealing with have 10+ columns that need searched.

The following may be a slight improvement as code readability goes, but as far as performance, we're still in the same ball park.开发者_StackOverflow社区

 select * from A where (Col1 + ' ' + Col2 + ' ' + Col3) like '%klah%'

I have thought about simply adding insert, update, and delete triggers that simply add the concatenated version of the above columns into a separate table that shadows this table.

Sample Shadow_Table:

ID | searchtext                 |
---------------------------------
1  | oklahoma colorado Utah     |
2  | arkansas colorado oklahoma |
3  | florida michigan florida   |
---------------------------------

This would allow us to perform the following query to search for '%klah%'

select * from Shadow_Table where searchtext like '%klah%'

I really don't like having to remember that this shadow table exists and that I'm supposed to use it when I am performing multi-column substring matching, but it probably yields pretty quick reads at the expense of write and storage space.

My gut feeling tells me there there is an existing solution built into SQL Server 2008. However, I don't seem to be able to find anything other than research papers on the subject.

Any help would be appreciated.

From your description it sounds like you are looking for a way to improve exact searching. LIKE is the proper tool to use when you are trying to find character strings that EXACTLY match your string. If you are worried about performance, than you should consider indexing or even a custom index such as you've described.

Maybe consider a persisted computed column instead of a shadow table. The overhead on inserts/updates should be less than using triggers, and the query time will probably be equivalent.

On Full Text Search

Full text searching is designed as a natural language search.

Consider from the end user perspective. If I were searching for "Oklahoma", I would probably start with either "okla" or "ok" or "oklahoma". I would not search for "homa". This is the way our human minds think. Hence, "natural" language searching.

Natural language searching uses root stems and similar words to increase the total number of results. However, it is not optimal if you want all the results to specifically match your search term: e.g. Free text will match "I drove to my lesson" with "driving and lessons" even though neither word specifically appears.

继续阅读：search sql sql-server-2005 sql-server-2008

SQL Server - Multi-Column substring matching

更多精彩内容

精彩评论

最新问答

仙境传说新启航装备怎么解绑?？

pptv电视怎样进工厂模式？

魔兽世界wlk2024美酒节坐骑可以交易吗?？

pubg泰戈丧尸关卡在哪?？

有多少男人对自个孩子的姓氏抱着无所谓的态度呢？？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？