开发者

Regex to capture groups

My group could either be of the form x/y, x.y or x_y.z. Ea开发者_运维技巧ch group is separated by an underscore. The groups are unordered.

Example:

ABC/DEF_abc.def_PQR/STU_ghi_jkl.mno

I would like to capture the following:

ABC/DEF
abc.def
PQR/STU
ghi_jkl.mno

I have done this using a fairly verbose string iteration and parsing method (shown below), but am wondering if a simple regex can accomplish this.

private static ArrayList<String> go(String s){
    ArrayList<String> list = new ArrayList<String>();
    boolean inSlash = false;
    int pos = 0 ;
    boolean inDot = false;
    for(int i = 0 ; i < s.length(); i++){
        char c = s.charAt(i);
        switch (c) {
        case '/':
            inSlash = true;
            break;
        case '_':
            if(inSlash){
                list.add(s.substring(pos,i));
                inSlash = false;
                pos = i+1 ;
            }
            else if (inDot){
                list.add(s.substring(pos,i));
                inDot = false;
                pos = i+1;
            }
            break;
        case '.':
            inDot = true;
            break;
        default:
            break;
        }

    }
    list.add(s.substring(pos));
    System.out.println(list);
    return list;
}


Have a try with:

((?:[^_./]+/[^_./]+)|(?:[^_./]+\.[^_./]+)|(?:[^_./]+(?:_[^_./]+)+\.[^_./]+))

I don't know java syntax but in Perl:

#!/usr/bin/perl
use 5.10.1;
use strict;
use warnings;

my $str = q!ABC/DEF_abc.def_PQR/STU_ghi_jkl.mno_a_b_c.z_a_b_c_d.z_a_b_c_d_e.z!;
my $re = qr!((?:[^_./]+/[^_./]+)|(?:[^_./]+\.[^_./]+)|(?:[^_./]+(?:_[^_./]+)+\.[^_./]+))!;
while($str=~/$re/g) {
    say $1;
}

will produce:

ABC/DEF
abc.def
PQR/STU
ghi_jkl.mno
a_b_c.z
a_b_c_d.z
a_b_c_d_e.z


There might be a problem with the underscore since it's not always a separator.

Maybe: ((?<=_)\w+_)?\w+[./]\.w+


This regex would probably do (tested with .Net regular expressions):

[a-zA-Z]+[./][a-zA-Z]+|[a-zA-Z]+_[a-zA-Z]+\.[a-zA-Z]+

(If you know your input is well formed there is no need to explicitly match the separator)


This one goes with positive lookahead instead of alternations

[A-Za-z]+(_(?=[A-Za-z]+\.[A-Za-z]+))?[A-Za-z]+[/.][A-Za-z]+
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜