开发者

Regular Expression :match string containing only non repeating words

I have this situation(Java code): 1) a string such as : "A wild adventure" should match. 2) a string with adjacent repeated words: "A wild wild adventure" shouldn't match.

With this regular expression: .* \b(\w+)\b\s*\1\b.* i can match strings containing adjacent repeated words.

How to reverse the situation i.e how to match strings which do not contain adjacent repeat word开发者_如何学JAVAs


Use negative lookahead assertion, (?!pattern).

    String[] tests = {
        "A wild adventure",      // true
        "A wild wild adventure"  // false
    };
    for (String test : tests) {
        System.out.println(test.matches("(?!.*\\b(\\w+)\\s\\1\\b).*"));
    }

Explanation courtesy of Rick Measham's explain.pl:

REGEX: (?!.*\b(\w+)\s\1\b).*
NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    \b                       the boundary between a word char (\w)
                             and something that is not a word char
--------------------------------------------------------------------------------
    (                        group and capture to \1:
--------------------------------------------------------------------------------
      \w+                      word characters (a-z, A-Z, 0-9, _) (1
                               or more times (matching the most
                               amount possible))
--------------------------------------------------------------------------------
    )                        end of \1
--------------------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
    \1                       what was matched by capture \1
--------------------------------------------------------------------------------
    \b                       the boundary between a word char (\w)
                             and something that is not a word char
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))

See also

  • regular-expressions.info/Lookarounds

Related questions

  • using regular expression in Java
    • Uses negative lookahead to ensure a string doesn't have a character occuring more than once
  • Java split is eating my characters.
    • Many examples of using assertions
  • How do I convert CamelCase into human-readable names in Java?
    • Very instructive example of using lookarounds

Note

Negative assertions only make sense when there are also other patterns that you want to positively match (see examples above). Otherwise, you can just use boolean complement operator ! to negate matches with whatever pattern you were using before.

String[] tests = {
    "A wild adventure",      // true
    "A wild wild adventure"  // false
};
for (String test : tests) {
    System.out.println(!test.matches(".*\\b(\\w+)\\s\\1\\b.*"));
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜