Regex to capture groups
My group could either be of the form x/y, x.y or x_y.z. Ea开发者_运维技巧ch group is separated by an underscore. The groups are unordered.
Example:
ABC/DEF_abc.def_PQR/STU_ghi_jkl.mno
I would like to capture the following:
ABC/DEF
abc.def
PQR/STU
ghi_jkl.mno
I have done this using a fairly verbose string iteration and parsing method (shown below), but am wondering if a simple regex can accomplish this.
private static ArrayList<String> go(String s){
ArrayList<String> list = new ArrayList<String>();
boolean inSlash = false;
int pos = 0 ;
boolean inDot = false;
for(int i = 0 ; i < s.length(); i++){
char c = s.charAt(i);
switch (c) {
case '/':
inSlash = true;
break;
case '_':
if(inSlash){
list.add(s.substring(pos,i));
inSlash = false;
pos = i+1 ;
}
else if (inDot){
list.add(s.substring(pos,i));
inDot = false;
pos = i+1;
}
break;
case '.':
inDot = true;
break;
default:
break;
}
}
list.add(s.substring(pos));
System.out.println(list);
return list;
}
Have a try with:
((?:[^_./]+/[^_./]+)|(?:[^_./]+\.[^_./]+)|(?:[^_./]+(?:_[^_./]+)+\.[^_./]+))
I don't know java syntax but in Perl:
#!/usr/bin/perl
use 5.10.1;
use strict;
use warnings;
my $str = q!ABC/DEF_abc.def_PQR/STU_ghi_jkl.mno_a_b_c.z_a_b_c_d.z_a_b_c_d_e.z!;
my $re = qr!((?:[^_./]+/[^_./]+)|(?:[^_./]+\.[^_./]+)|(?:[^_./]+(?:_[^_./]+)+\.[^_./]+))!;
while($str=~/$re/g) {
say $1;
}
will produce:
ABC/DEF
abc.def
PQR/STU
ghi_jkl.mno
a_b_c.z
a_b_c_d.z
a_b_c_d_e.z
There might be a problem with the underscore since it's not always a separator.
Maybe: ((?<=_)\w+_)?\w+[./]\.w+
This regex would probably do (tested with .Net regular expressions):
[a-zA-Z]+[./][a-zA-Z]+|[a-zA-Z]+_[a-zA-Z]+\.[a-zA-Z]+
(If you know your input is well formed there is no need to explicitly match the separator)
This one goes with positive lookahead instead of alternations
[A-Za-z]+(_(?=[A-Za-z]+\.[A-Za-z]+))?[A-Za-z]+[/.][A-Za-z]+
精彩评论