FULLTEXT search with a multi-language column
Is there a way to use FULLTE开发者_如何学运维XT in a multi-language table without giving each language its own column?
I have one column I need to search, but the language in that column varies:
ProductID int
Description nvarchar(max)
Language char(2)
Language
can be one of: en
, de
, it
, kr
, th
Currently I build a concordance and use that for searching. But this is only for English, German and Italian, and even for those it doesn't support stemming. Everything else uses LIKE '%searchterm%'
, and I'm trying to improve on that.
I'm using SQL Server 2005.
Instead of a separate column per language, if you know which rows contain which language you could create an indexed view filtered to include only rows of a single langauge per language and FTI each of those. You'll need to query each view individually though.
Quoting from the Microsoft reference on CREATE FULLTEXT INDEX:
For non-BLOB and non-XML columns containing text data in multiple languages, or for cases when the language of the text stored in the column is unknown, it might be appropriate for you to use the neutral (0x0) language resource. However, first you should understand the possible consequences of using the neutral (0x0) language resource. For information about the possible solutions and consequences of using the neutral (0x0) language resource, see Best Practices for Choosing a Language When Creating a Full-Text Index.
I know this is an old question, but I just encountered it.
One approach I have seen is to use an XML column and specify the xml:lang
attribute. As mentioned in CREATE FULLTEXT INDEX (Transact-SQL).
For documents stored in XML- or BLOB-type columns, the language encoding within the document will be used at indexing time. For example, in XML columns, the xml:lang attribute in XML documents will identify the language. At query time, the value previously specified in language_term becomes the default language used for full-text queries unless language_term is specified as part of a full-text query.
The main downside of this approach is that it changes the data type to XML, but it seemed to work fine for our needs at the time.
I am using views for 20+ languages. Works fine for querying (if a little complex to select the correct view to use in sprocs). However, inserts and updates on the underlying table get clobbered as the plan seems to need to include a check on every ft view even with no change tracking.
精彩评论