Use NSRegularExpression to find how "close" a phase is to another phrase
I am trying to figure out how to use NSRegularExpression to see how "close" a string matches another string. I know I can just create a set of substrings and use NSRange for each to see. E.g.
"up", "to", "a", "point" match against "up two a point" I could match 3 out of 4 Similarly, "up too a point" matches 3, "Up to apoint" matches all 4.
I'm hoping that using regular expressions I could generalize the pattern matching so that I could just use "up to a point" and match it against what I find in another string, such as:
uptoapoint, Up to a point, UP TO A POINT, up too point, etc and开发者_Go百科 get a "percentage" match.
Not sure this is do-able, thus my question. Thanks for any help/advice.
Regex is certainly not the right tool for this.
Do this instead:
- Unify strings either by running them thru
[string uppercaseString]
or[string lowercaseString]
. - Compute Levenshtein Distance between unified strings.
- …
- Profit!
The Levenshtein Distance (or edit distance) is the absolute number of characters that need to be exchanged/removed/added in/from/to stringA
in order to morph it into stringB
.
Objective-C Implementation of the Levenshtein Distance.
Extended Note: It does not look like you are in danger here, but it's worth noting that while the Levenshtein distance is pretty handy for comparing short strings, it is not very useful for calculating distances between entire documents. Most implementations of the Levenshtein Distance require memory space of m*n
(m and n being the lengths of your strings). And while some implementations are able to reduce this to m+n
(afaik), their run time still requires O(n*n)
, which is basically equivalent to O(n^2)
on average.
See http://en.wikipedia.org/wiki/Levenshtein_distance
Here is an implementation http://www.merriampark.com/ldobjc.htm
精彩评论