开发者

How to change word-break characters in SQL Server Full-Text indexing

By default, when one tells SQL Server (currently using 2008) to Full-开发者_开发技巧Text index a column, it treats characters such as "@" and "." as work-breakers, similarly to " ".

I'd like to restrict the work-breaking characters to just be " ", so that "joe.bloggs@somewhere.com" is treated as a single word.

It appears that one can choose a "Language for Word Breaker" against the indexed column - perhaps I need to set up a custom language?

Does anyone know how I can do this?


In order to make your word breaker fly with SQL Server you have to disable signature verification and add your COM CLSID to the registry. For more info check out this post: http://blogs.msdn.com/shajan/default.aspx It helped me a lot! However I never managed to create my own language so I simply hijacked an already existing one.


According to TechNet's article on SQL 2008 Full-Text Search:

well-known published interfaces provide the framework for Full-Text Engine extensibility. For more information, see the Microsoft Developer Network (MSDN) topics IFilter, IWordBreaker, and IStemmer.

So, at least according to this article, you can implement a custom IWordBreaker implementation (see http://www.siao2.com/2005/03/14/395199.aspx for more info) and get SQL to use it.

What I haven't found so far is how to plug your custom word-breaker into SQL itself-- how to tell SQL to use your word-breaker. Sorry for the incomplete answer... hope I got you at least part of the way to a solution.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜