开发者

Use NSRegularExpression to find how "close" a phase is to another phrase

I am trying to figure out how to use NSRegularExpression to see how "close" a string matches another string. I know I can just create a set of substrings and use NSRange for each to see. E.g.

"up", "to", "a", "point" match against "up two a point" I could match 3 out of 4 Similarly, "up too a point" matches 3, "Up to apoint" matches all 4.

I'm hoping that using regular expressions I could generalize the pattern matching so that I could just use "up to a point" and match it against what I find in another string, such as:

uptoapoint, Up to a point, UP TO A POINT, up too point, etc and开发者_Go百科 get a "percentage" match.

Not sure this is do-able, thus my question. Thanks for any help/advice.


Regex is certainly not the right tool for this.

Do this instead:

  1. Unify strings either by running them thru [string uppercaseString] or [string lowercaseString].
  2. Compute Levenshtein Distance between unified strings.
  3. Profit!

The Levenshtein Distance (or edit distance) is the absolute number of characters that need to be exchanged/removed/added in/from/to stringA in order to morph it into stringB.

Objective-C Implementation of the Levenshtein Distance.

Extended Note: It does not look like you are in danger here, but it's worth noting that while the Levenshtein distance is pretty handy for comparing short strings, it is not very useful for calculating distances between entire documents. Most implementations of the Levenshtein Distance require memory space of m*n (m and n being the lengths of your strings). And while some implementations are able to reduce this to m+n (afaik), their run time still requires O(n*n), which is basically equivalent to O(n^2) on average.


See http://en.wikipedia.org/wiki/Levenshtein_distance

Here is an implementation http://www.merriampark.com/ldobjc.htm

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜