Java regex: Repeating capturing groups
An item is a comma delimited list of one or more strings of numbers or characters e.g.
"12"
"abc"
"12,abc,3"
I'm trying to match a bracketed list of zero or more items in Java e.g.
""
"(12)"
"(abc,12)"
"(abc,12),(30,asdf)"
"(qqq,pp),(abc,12),(30,asdf,2),"
which should return the following matching groups respectively for the last example
qqq,pp
abc,12
30,asdf,2
I've come up with the following (incorrect)pattern
\((.+?)\)(?:,\((.+?)\))*
which matches only the following for the last example
qqq,pp
30,asdf,2
开发者_StackOverflow
Tips? Thanks
That's right. You can't have a "variable" number of capturing groups in a Java regular expression. Your Pattern has two groups:
\((.+?)\)(?:,\((.+?)\))*
|___| |___|
group 1 group 2
Each group will contain the content of the last match for that group. I.e., abc,12
will get overridden by 30,asdf,2
.
Related question:
- Regular expression with variable number of groups?
The solution is to use one expression (something like \((.+?)\)
) and use matcher.find
to iterate over the matches.
You can use regular expression like ([^,]+)
in loop or just str.split(",")
to get all elements at once. This version: str.split("\\s*,\\s*")
even allows spaces.
(^|\s+)(\S*)(($|\s+)\2)+ with ignore case option /i
She left LEft leFT now
example here - https://regex101.com/r/FEmXui/2
Match 1
Full match 3-23 ` left LEft leFT LEFT`
Group 1. 3-4 ` `
Group 2. 4-8 `left`
Group 3. 18-23 ` LEFT`
Group 4. 18-19 ` `
Using an ANTLR grammar can solve this problem. This is really beyond the reasonable capabilities of RegExp, although I believe some newer versions of Microsoft's implementation in .Net support this behavior. See this other SO question. If you're stuck with everything but .Net your best option is going to be a parser-generator (you don't have to use ANTLR, that's just my personal preference). Going through the ANTLR4 GitHub page can help get someone started on matching on more complex expressions with things like repeating match groups. Another option that doesn't require a whole lot of new learning is to tokenize the string input that you're wanting to match on and pull out the pieces that you want, but this can prove to be extremely messy and create nightmarish chunks of parsing code that are better-suited to a generated parser.
This may be the solution :
package com.drl.fw.sch;
import java.util.regex.Pattern;
public class AngularJSMatcher extends SimpleStringMatcher {
Matcher delegate;
public AngularJSMatcher(String lookFor){
super(lookFor);
// ng-repeat
int ind = lookFor.indexOf('-');
if(ind >= 0 ){
StringBuilder sb = new StringBuilder();
boolean first = true;
for (String s : lookFor.split("-")){
if(first){
sb.append(s);
first = false;
}else{
if(s.length() >1){
sb.append(s.substring(0,1).toUpperCase());
sb.append(s.substring(1));
}else{
sb.append(s.toUpperCase());
}
}
}
delegate = new SimpleStringMatcher(sb.toString());
}else {
String words[] = lookFor.split("(?<!(^|[A-Z]))(?=[A-Z])|(?<!^)(?=[A-Z][a-z])");
if(words.length > 1 ){
StringBuilder sb = new StringBuilder();
for (int i=0;i < words.length;i++) {
sb.append(words[i].toLowerCase());
if(i < words.length-1) sb.append("-");
}
delegate = new SimpleStringMatcher(sb.toString());
}
}
}
@Override
public boolean match(String in) {
if(super.match(in)) return true;
if(delegate != null && delegate.match(in)) return true;
return false;
}
public static void main(String[] args){
String lookfor="ngRepeatStart";
Matcher matcher = new AngularJSMatcher(lookfor);
System.out.println(matcher.match( "<header ng-repeat-start=\"item in items\">"));
System.out.println(matcher.match( "var ngRepeatStart=\"item in items\">"));
}
}
精彩评论