Are there any string comparison alogrithms out there that are "better" than Levenshtein Distance?
I have been using it for a project I am working on, but some of the results aren't what I would choose. For example:
When "Date" is compared to
- "State" it has a lev distance of 2
- "Today's Date" it has a lev distance of 9
This is what we would expect from the algorithm of course, but I'm curious if anyone knows of something out there that will give a closer match to any compared strings that have an exact match of the source string (Date)? Meaning that "Today's Date" would have a higher ranking because it has "Date" in it.
Bonus points if you can find a .NET library that implements this.
You probably wanted to find Longest common subsequence?
I think it's meant for you to tokenize the word before employing Levenshtein. As an alternative there is Jaro-Winker distance too.
There's a .net library SimMetrics which seems to cover a few alternatives.
To do it properly you need some context of the use
If you trying to do an address lookup then "Nosuch STREET" might have a perfect match of "Nosuch ROAD", or in a no-fly list you want all 20 spelling of Gadaffi to match.
if you are trying to analyse how much a piece of historic text has changed with copying then you need a different algorith,
精彩评论