Search strategy
I'm writing a java program that needs to find possible matches for specified strings. Strings will generally be in the form of
onetwothree one.two.three onesomethingtwoblah onesomething
where one two and three are parts of an actual title. Candidate matches from the database are in the form one+two+three. The method i have come up with is to compare each token from database candidates with the entire specified string using regex. A counter for the number of database token matches will be used to determine the rank of possib开发者_如何学Pythonle matches.
My concern is the accuracy of matches presented and the method's ability to successfully find matches if they do exist. Is this method efficient?
Depends, if you have a lot of database records and large strings to compare against the search may end up being quite expensive. It would need to pass the entire input string for each record.
You could consider doing a single pass over the input string and search tokens against the database. Some smart search indexed could help speed this up. When pairing multiple tokens you would need to figure out a way knowing when to stop scanning and advance to a next token. Partial matches could help here; store one+two+three also as seperate one, two and three. Or if the order matters store it also as one, one+two and one+two+three.
Basically as you scan you have a list of candidate DB entries that gets smaller and smaller, comparable to a facet search.
精彩评论