开发者

Determine the probability of numeric typing error

I have:

  1. Correct numerical ID such as Phone number / Social-security number / etc.
  2. Another number, from some data-entry form

The 2nd number is similar, but not equal to the 1st number. Both numbers are valid.

I want to calculate how probable it is that the 2nd number is actually a typing error of the 1st number.

Such errors may include:

Does anyone know about existance of such algorithm / code?

Edit:

I'm not looking for a general string-similarity algorithm. I'm looking for an algorithm optimized for human number-entry typing errors, or for some research about this topic.


There are several algorithms to measure a string similarity.

You could implement some variant of the Levenshtein distance or Damerau-Levenshtein distance that rates the types of errors differently.


Treat the numbers as a sequence of digits and Calculate the similarity ratio between the two numbers. 2.0*M / T. Where T is the number of digits in both numbers M is the number of matches in the 2 numbers

a similarity ratio of 0.6 and above means the 2 numbers are similar

Note that the ratio is 1 if the numbers are identical, and 0 if they have no digit in common.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜