开发者

How do I count odd and even amounts of characters with regular expressions?

I'm trying to pull out all strings which have an even number of B's and an odd number of C's. I have the regexes to match odd A's and even B's but I cannot get the two to work together. The strings are delimited by whitespace (tabs, newlines, spaces).

e.g.

XABBAC     ABCDEBCC ABSDERERES ABBAAJSER     HGABAA

I have for odd A's

\b[^A]*A([^A]*A[^A]*A)*[^A]*\b

And for even B's

\b[^B]*(B[^B]*B[^B]*)*[^B]*\b
开发者_JAVA百科

I know I need to use +ve lookahead and have tried:

\b(?=[^A]*A([^A]*A[^A]*A)*[^A]*\b)[^B]*(B[^B]*B[^B]*)*[^B]*\b

but it doesn't work - does anybody know why?


The problem is that your regexes (regexen?) can match zero characters - \b\b will match on a single word boundary, and so will \b{someregexthatcanmatchzerocharacters}\b.


As Anon already mentioned: your pattern matches empty strings, causing m.find() to never advance in the target string. So, you need to let your even B's actually match Strings containing 2, 4, 6, ... number of B's. If you want, you can alternate between an even number of B's and this: [^B\\s]+ (which matches Strings containing 0 B's). As long as you actually match one or more character with it, then you should be okay.

Also, you don't want to look ahead and let the negated classes match spaces: that way you get too much matches.

Try something like this:

String text = "XABBAC     ABCDEBCC ABSDERERES ABBAAJSER     HGABAA";

String oddAs = "\\b[^A\\s]*A([^A\\s]*A[^A\\s]*A)*[^A\\s]*\\b";
String evenBs = "\\b([^B\\s]*(B[^B\\s]*B[^B\\s]*)+|[^B\\s]+)\\b";

Pattern p = Pattern.compile(String.format("(?=%s)(?=%s)\\S+", oddAs, evenBs));
Matcher m = p.matcher(text);

while (m.find()) {
    System.out.println(m.group());
}

which produces:

ABCDEBCC
ABBAAJSER


With commons.lang.StringUtils it's even more concise:

String data = "XABBAC     ABCDEBCC ABSDERERES ABBAAJSER    HGABAA";
String[] items = data.split("\\s+");

for(String item: items ) {
    if (countMatches(item, "B") % 2 == 0
     && countMatches(item, "C") % 2 != 0) {
        System.out.println( item );
    }
}


regex is overrated

    String str = "XABBAC     ABCDEBCC ABSDERERES ABBAAJSER     HGABAA";
    String[] s = str.split("\\s+");
    for (int j=0 ;j< s.length;j++) {
        int countC=0  ;
        int countB=0;
        for(int i=0;i<s[j].length();i++){
            char c = s[j].charAt(i) ;
            if (c == 'C') countC++;
            if (c == 'B') countB++;
        }
        if ( (countC % 2) != 0 )
            System.out.println( s[j] + " has odd C");
        if ( (countB % 2) == 0 )
            System.out.println( s[j] + " has even B");
    }
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜