Search full name with single input box
Let's say I have a table that contains the following fields:
UserId | FirstName | LastName
I'd like to put together a search form with a single input for finding people by name. I've seen quite a few questions that touch upon this, but am looking for some guidance from someone that has implemented this in a large database (hundreds of thousands of "User" records in my case).
Ideally, I'd like the semantics of the search to be somewhat intelligent; to give some examples:
Case 1:
Input: Jo
Sample Matches: "John Doe", "Jane Johnson"
Case 2:
Input: J D (note the space between 'J' and 'D')
Sample Matches: "John Doe", "Jane Doe"
I'm sure there are some additional edge cases that I'm missing here, such as people with multi-part first and/or last names - so even better if a possible solution accounted for those sorts of things.
I'm using NHibernate 2.1.2 GA backed by a SQL Server 2008 R2 database for reference, and have no problem using hql, criteria or even raw SQL if开发者_开发问答 that fits the bill. Using full text search is also an option if that's what I'll need for decent performance figures on this. I'm not sure NHibernate Search is worth the hassle in this case (since my needs are so focused), so let's assume I don't want to go that route.
Having implemented this particular feature with criteria i have to say that the most straightforward way is generating a series of ICriterion LIKE's on the fly after having split the input, using the space as a delimiter. Performance is not really bad, even on a db that had something around the million-mark entries, but since LIKE queries do not take advantage of indexes its only natural that as the data set expands the query will be less performant.
Assumptions that can be made to increase performance preemptively are to treat tokens (the parts split by space) as the beginning only, which will result in ..LIKE 'JO%' which is faster than ..LIKE '%JO%', or even to treat single character tokens differently (like your second example). In my case, because i used this in autocompleted input fields i ignored them: the user must acknowledge that he is searching JO for John and JA for Jane, J would return nothing (or to be more precise the query didn't execute).
Afterwards i implemented this using Sql Server's Fulltext and the performance difference was impressive to say the least. As always, it depends on the size of the data set, and Fulltext indexes have a maintenance overhead, at least on 2005 that i used.
The lucene option is not a bad choice either, its fast and not difficult to implement and it unlocks the option for smart result sets like "Did you mean John" when inputting "Jon". Also its way more manageable than Sql Server's Fulltext.
EDIT, comment answer
I'm just saying that i have done all 3 options above... the basic LIKE
approach worked well but after the initial implementation i was looking for perf improvements and changed the LIKE with Sql Server's FullText option (CONTAINS
)... Both worked in production well...
For the query generation part, if i remember correctly, i still generated dynamically the query fragments for each token, for each column (FirstName, LastName) but full-text's are better than LIKE's in actual query execution time
in development i changed the FullText with Lucene, and while the perf characteristics are similar to fulltext, the other aspects (development, maintenance, extensions) are way better with Lucene/NHibernate.Search. I didn't have the chance to work with Sql Server's 2008 FullText implementation, which allegedly is better than 2005's
As a boot-note, if you don't go the LIKE way and if you want to move your application to a different datastore than Sql Server, then decoupling your fulltext queries by using Lucene/NHibernate.Search is the better solution
Rather than attack this with a full on fuzzy string search, your best bet is to do a series of searches in sequence and give the results of the first one that returns anything.
So first do an exact search, where the search string matches either first or last name, then do a wild card search, where the first or last name matches the search string with a %
on either side.
After that, you can try the fancier options. Split the search string on the space, and try firstName = split(0) & "%", lastName = split(1) & "%". Next, try first name or second name = replace(searchString, " ", "%") (turning all spaces into wild cards).
This will perform a series of smart searches that will give priority to better matches. It will also have the added bonus effect of returning results faster when the user gives an exact match.
精彩评论