开发者

Are there any string comparison alogrithms out there that are "better" than Levenshtein Distance?

I have been using it for a project I am working on, but some of the results aren't what I would choose. For example:

When "Date" is compared to

  1. "State" it has a lev distance of 2
  2. "Today's Date" it has a lev distance of 9
开发者_如何转开发

This is what we would expect from the algorithm of course, but I'm curious if anyone knows of something out there that will give a closer match to any compared strings that have an exact match of the source string (Date)? Meaning that "Today's Date" would have a higher ranking because it has "Date" in it.

Bonus points if you can find a .NET library that implements this.


You probably wanted to find Longest common subsequence?


I think it's meant for you to tokenize the word before employing Levenshtein. As an alternative there is Jaro-Winker distance too.

There's a .net library SimMetrics which seems to cover a few alternatives.


To do it properly you need some context of the use

If you trying to do an address lookup then "Nosuch STREET" might have a perfect match of "Nosuch ROAD", or in a no-fly list you want all 20 spelling of Gadaffi to match.

if you are trying to analyse how much a piece of historic text has changed with copying then you need a different algorith,

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜