开发者

Comparing strings in python to find errors

I have a string that is the correct spelling of a word:

FOO开发者_运维问答

I would allow someine to mistype the word in such ways:

FO, F00, F0O ,FO0

Is there a nice way to check for this ? Lower case should also be seen as correct, or convert to upper case. What ever would be the prettiest.


One approach is to calculate the edit distance between the strings. You can for example use the Levenshtein distance, or invent your own distance function that considers 0 and O more close than 0 and P, for example.

Another is to transform each word into a canonical form, and compare canonical forms. You can for example convert the string to uppercase, replace all 0s with Os, 1s with Is, etc., then remove duplicated letters.

>>> import itertools
>>> def canonical_form(s):
        s = s.upper()
        s = s.replace('0', 'O')
        s = s.replace('1', 'I')
        s = ''.join(k for k, g in itertools.groupby(s))
        return s
>>> canonical_form('FO')
'FO'
>>> canonical_form('F00')
'FO'
>>> canonical_form('F0O')
'FO'


The builtin module difflib has a get_close_matches function.

You can use it like this:

>>> import difflib
>>> difflib.get_close_matches('FO', ['FOO', 'BAR', 'BAZ'])
['FOO']
>>> difflib.get_close_matches('F00', ['FOO', 'BAR', 'BAZ'])
[]
>>> difflib.get_close_matches('F0O', ['FOO', 'BAR', 'BAZ'])
['FOO']
>>> difflib.get_close_matches('FO0', ['FOO', 'BAR', 'BAZ'])
['FOO']

Notice that it doesn't match one of your cases. You could lower the cutoff parameter to get a match:

>>> difflib.get_close_matches('F00', ['FOO', 'BAR', 'BAZ'], cutoff=0.3)
['FOO']


you can use the 're' module

re.compile(r'f(o|0)+',re.I) #ignore case

you can use curly braces to limit the number of occurrences too. you can also get 'fancy' and define your 'leet' sets and add them in w/ %s

as in:

ay = '(a|4|$)'
oh = '(o,0,\))'
re.compile(r'f%s+' % (oh),re.I)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜