开发者

Regular Expression to find words in varying orders

I am searching for a way to model a RegEx which would give a match for both of these strings when searched for "sun shining".

  1. the sun is sh开发者_如何学Cining
  2. a shining sun is nice


I'd use positive lookaheads for each word, like this (and you can add as many as you like):

(?=.*?\bsun\b)(?=.*?\bshining\b).*


Basic regular expressions don't handle differing orders of words very well. There are ways to do it but the regular expressions become ugly and unreadable to all but the regex gurus. I prefer to opt for readability in most cases myself.

My advice would be to use a simple or variant, something like:

sun.+shining|shining.+sun

with word boundaries if necessary:

\bsun\b.+\bshining\b|\bshining\b.+\bsun\b

As Lucero points out, this will become unwieldy as the number of words your searching for increases, in which case I would go for the multiple regex match solution:

def hasAllWords (string, words[]):
    count = words[].length()
    for each word in words[]:
        if not string.match ("\b" + word + "\b"):
            return false
    return true

That pseudo-code will run a check for each word and ensure that all of them appear.


You will need to use a regular expression that considers every permutation like this:

\b(sun\b.+\bshining|shining\b.+\bsun)\b

Here the word boundaries \b are used to only match the words sun and shining and no sub-words like in “sunny”.


You use two regexes.

if ( ( $line =~ /\bsun\b.+\bshining\b/ ) ||
     ( $line =~ /\bshining\b.+\bsun\b/ ) ) {
   # do whatever
}

Sometimes you have to do what seems to be low-tech. Other answers to this question will have you building complex regexes with alternation and lookahead and whatever, but sometimes the best way is to do it the simplest way, and in this case, it's to use two different regexes.

Don't worry about execution speed. Unless you benchmark this solution against other more complicated single-expression solutions, you don't know which is faster. It's incredibly easy to write slow regexes.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜