开发者

Articles and advice for implementing full-text search form

I need to create a full-text search form for a database of emails / support tickets (in C#) and I'm looking for advice and articles on how to approach this. In particular I'd like to know how to approach the classic full-text search problems, for example:

  • Making sure that matches are sensible, for example if s开发者_Python百科omeone enters "big head" and a document contains "big hairy head", making sure that document is returned in the search.
  • Ordering results by relevancy.
  • How to bets display matches, for example highlighting matching terms

I know that full-text search is a fairly mammoth subject area in itself, I'm just looking for simple articles and advice on how to create something that is at least marginally useful and usable.

I've used things like Lucene.Net before - obviously some sort of full-text index is going to be needed - the challenging bit is taking the list of documents that Lucene returns and presenting it in a useful way.

UPDATE: I want to clarify slightly what I mean - there are hundreds of generic full-text search forms that all perform a very similar function, for example:

  • The search button on each and every internet forum
  • The search button on each and every wiki
  • Windows / google desktop search
  • Google

Each of those searches takes information from different sources, and displays them using different means (html, Windows form etc...) but each of those solve the same problems in varyingly complex methods, and for the most part (with the possible exception of desktop search) the input data is of the same format: Html or text.

I'm looking for advice and common strategies on how to do things like rank search results in ways that are likely to be useful to the user.

Alternatively one strategy I had considered was doing something like taking some wiki software, exporting my entire data set as text into that wiki, and just using the wiki to search - the sort of search I'm after is for all intents and purposes functionally identical to 99% of searches that already exist, I just want to give it a different input data source, and format the output slightly differently (both of which I already know how to do).

Surely there must be some advice on how those sorts of searches are done?


You can use a great library from apache Lucene.Net also Linq to Lucene extensions can simplify your work


SQL Server (including the Express versions) all have a full free-text search facility. This can search text within columns but can also harness IFilters to search within embedded documents. You can use the FREETEXTTABLE command in T-SQL to intelligently search within content and return it in ranking order:

"Returns a table of zero, one, or more rows for those columns containing character-based data types for values that match the meaning, but not the exact wording, of the text in the specified freetext_string. FREETEXTTABLE can only be referenced in the FROM clause of a SELECT statement like a regular table name.

Queries using FREETEXTTABLE specify freetext-type full-text queries that return a relevance ranking value (RANK) and full-text key (KEY) for each row."

eg.

SELECT FT_TBL.CategoryName 
    ,FT_TBL.Description
    ,KEY_TBL.RANK
FROM dbo.Categories AS FT_TBL 
    INNER JOIN FREETEXTTABLE(dbo.Categories, Description, 
        'sweetest candy bread and dry meat') AS KEY_TBL
        ON FT_TBL.CategoryID = KEY_TBL.[KEY];

For more info have a read of Understanding SQL Server Full-Text Indexing.


Have a look at CONTAINSTABLE too, as it supports wildcards and weighting etc...

http://msdn.microsoft.com/en-us/library/ms189760.aspx


If you don't want to go the SQL root then also consider Microsoft Search Server 2008 Express - it's free, powerful and looks easy to use. It matches all your requirements and handles things like ranking etc. automatically.


Your topic is a database specific question. you need to sepcify which database you will use. You can give the search key word to database engine instead of searching by your program.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜