开发者

Regex to find words with one character diff

I have a word dictionary and I'm looking for regex that can help me to get words开发者_开发问答 with only one character diff. For example say for word BIG it could be words BIT, BUG etc. Length of the words should be equal.

Thank you!


/\b([a-z]ig|b[a-z]g|bi[a-z])\b/i

You'd have to do this with every word. Regex alone is probably not the best tool for this job.


Use something like this, perhaps?

>>> def word_difference(word1, word2):
...     c1, c2 = list(word1), list(word2)
...     return [(i, c1[i], c2[i]) for i in in range(len(c1)) if c1[i] != c2[i]]
>>> word_difference("foo", "bar")
[(0, 'f', 'b'), (1, 'o', 'a'), (2, 'o', 'r')]
>>> word_difference("big", "bug")
[(1, 'i', 'u')]

Obviously, the length of the list returned is the number of characters that are different. I assume this is what you want, since you didn't state whether the characters may be in different positions or not - but that's just as easy, you can use sets.


I found nearly the same solution than the one using ideone. But, as vkolodrevskiy wrote “to get words with only one character diff“, I respected it.

My code is in Python. No language precised in the question.

import re

word = 'main'

RE = '|'.join(word[0:i]+'(?!'+char+')[a-z]'+word[i+1:] for i,char in enumerate(word))
RE = '('+RE+')'
print RE

ch = 'the main reason is pain due to rain. hello muin, where is maih ?'

print re.findall(RE,ch)


Well, you could do a bunch of complicated regular expressions, or ingenius ones, but I found something that I wanted to tell you about that may be a lot easier.

Check out the Levenshtein module to get the hamming distance between two strings. Then just get the ones that have a distance of one.

To install you can use pip install python-levenshtein. If you use Ubuntu or such you can use sudo apt-get install python-levenshtein. If you're on Windows, in order to fully utilize pip you'll need a C++ compiler (like Visual C++ 2010 express, if you're using Python 3, or Visual C++ 2008 express for Python 2.x; you can download those for free from Microsoft; do a web search for them if you want them).

import Levenshtein #Note the capital L
help(Levenshtein) #See the documentation
Levenshtein.hamming("cat", "sat") #Returns 1; they must be the same length, as you specified

There are lots of other cool functions besides hamming, though. Read the help (via the help function in the code above). The functions are actually surprisingly well-documented if you use the help function. Press q to quit the help, of course.


finally I did not use idea with regex, my solution looks like:

public boolean diffOneChar(String word1, String word2) {
    int diff=0;
    if(word1 == null || word2 == null) return false;
    if(word1.length() == 0 || word2.length() == 0) return false;
    if(word1.length() != word2.length()) return false;

    for(int i=0; i<word1.length(); i++) {
        if(word1.charAt(i)!=word2.charAt(i))
            diff++;
    }

    return diff == 1;
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜