Routines for suggesting alternatives
I have been tasked with coming up with a routine that will suggest alternative domain names to register if the customers original requested domain name is already registered.
The first step I think would be to split the requested domain back in to bits so that I could work out alternatives to try开发者_运维问答.
eg. mybigredtruck.com would be broken up in to "my", "big", "red" & "truck"
Then I would need some way of working out alternatives for these.
Does anybody know of any methods, components / web services that could do any of this functions. Any ideas will be greatfully accepted.
Here is a good place to start with a matching algorithm:
Obtain a dictionary of words
Remove nonalphabetic characters from the input string
Remove the TLD extension from the
input stringAssuming that the input text is spelt correctly, at to match it with a dictionary entry; if it does not match (in the case of undelimited concatenated words) then try one less character in a loop until it matches. Store the match but look for all other matches. Do the same for the remainder of the string.
The correct match would be the one where all substrings of the full input string is matched, e.g., wwww.soilofgarden.com = 'soil of garden' and not 'so?? of garden'
The most common implementation of suggestion algorithms that I have seen is to prepend or append relevant words. For domain names, the most common is to change the top-level domain (.com, .net, .gov, etc).
As far as splitting a delimiter-less string by the most likely English words, I think you may be in for a rough time.
A Google search for "mybigredtruck" doesn't suggest "my big red truck" as an alternate search. To me, that implies that the algorithm is extremely complex, if one even exists.
精彩评论