Database search engine - Sort by relevance according to specific relevance rules
I just got a list of requirements for a new search function for our websites. They're book publisher's websites, so that's the basic field we're operating in here. The data is stored in a Microsoft SQL 2005 database server (SP3) with fulltext enabled.
Now, the requirements state that the search can be done in three areas: book title, auth开发者_开发技巧or names, and book texts. That in itself is easy enough to do with three separate queries. However, there's more. The requirements state that search results should return in roughly the following order:
- Full title matches
- Full author name matches
- Partial title matches
- Full author lastname matches
- Partial author lastname matches
- Partial author fullname matches
- Book text matches
Furthermore, there's secondary requirements:
- Titles from the publisher itself should be ordered higher than that of a neighboring publisher (there's books from a dozen or so publishers in the same database)
- When a full match for a title is found, other books from the same author should be displayed (with again the titles from the same publisher getting precedence over the other - an author can publish with multiple publishers)
Lots of rules like that.
So, say you have a book Johnson, written by an author Pete Johnson (or whatever). The search query 'john' should then return the following (ish):
- Johnson (book) (partial title match)
- Pete Johnson (partial lastname match)
And the search query 'Johnson':
- Johnson (book) (full title match)
- Johnson (book) (partial title match) (omitted, already in the results)
- Pete Johnson (author) (full author lastname match)
- Pete Johnson (author) (partial author lastname match) (omitted)
- Pete Johnson (author) (partial author fullname match) (omitted)
- Johnson (book) (book text match) (omitted)
...Anyways. That's basically the requirements, and I just wanted to type that out. Now, for a few questions:
- Are there any books or articles about this particular subject you could point me to?
- How would this best be implemented? Can this be done in a set of basic queries (separate queries for each search requirement, post-processing to remove duplicates and merge the results), or can it be done in a single query?
- Or do I need to write an application that indexes the table and creates its own indexes and the like, which the search queries in turn?
I'm kinda grasping for ideas and suggestions here.
There're CONTAINSTABLE and FREETEXTTABLE functions - they return RANK column that is is "relevance ranking". Probably these functions plus some complex ordering by non-text columns will do the job.
If you decide to implement FTS in your app, have a look at third-party solutions. Lucene (or Lucene.NET) is probably good to start with.
精彩评论