This RegEx captures wrong number of groups
I have to parse a string and capture some values:
FREQ=WEEKLY;WKST=MO;BYDAY=2TU,2WE
I want to capture 2 groups:
grp 1: 2, 2
grp 2: TU, WE
The Numbers represents intervals. TU, WE represents weekdays. I need both.
I'm using this code:
private final static java.util.regex.Pattern regBYDAY = java.util.regex.Pattern.compile(".*;BYDAY=(?:([+-]?[0-9]*)([A-Z]{2}),?)*.*");
String rrule = "FREQ=WEEKLY;WKST=MO;BYDAY=2TU,2WE";
java.util.regex.Matcher result = regBYDAY.matcher(rrule);
if (result.matches())
{
int grpCount = result.groupCount();
for (int i = 1; i < grpCount; i++)
开发者_运维问答 {
String g = result.group(i);
...
}
}
grpCount == 2 - why? If I read the java documentation correctly (that little bit) I should get 5? 0 = the whole expression, 1,2,3,4 = my captures 2,2,TU and WE.
result.group(1) == "2";
I'm a C# Programmer with very little java experience so I tested the RegEx in the "Regular Expression Workbench" - a great C# Program for testing RegEx. There my RegEx works fine.
https://code.msdn.microsoft.com/RegexWorkbench
RegExWB:
.*;BYDAY=(?:([+-]?[0-9]*)([A-Z]{2}),?)*.*
Matching:
FREQ=WEEKLY;WKST=MO;BYDAY=22TU,-2WE,+223FR
1 => 22
1 => -2
1 => +223
2 => TU
2 => WE
2 => FR
You may also use this approach to increase readability and up to certain point independence from the implementation using a more common regexp subset
final Pattern re1 = Pattern.compile(".*;BYDAY=(.*)");
final Pattern re2 = Pattern.compile("(?:([+-]?[0-9]*)([A-Z]{2}),?)");
final Matcher matcher1 = re1.matcher(rrule);
if ( matcher1.matches() ) {
final String group1 = matcher1.group(1);
Matcher matcher2 = re2.matcher(group1);
while(matcher2.find()) {
System.out.println("group: " + matcher2.group(1) + " " +
matcher2.group(2));
}
}
Your regex works the same in Java as it does in C#; it's just that in Java you can only access the final capture for each group. In fact, .NET is one of only two regex flavors I know of that let you retrieve intermediate captures (Perl 6 being the other).
This is probably the simplest way to do what you want in Java:
String s= "FREQ=WEEKLY;WKST=MO;BYDAY=22TU,-2WE,+223FR";
Pattern p = Pattern.compile("(?:;BYDAY=|,)([+-]?[0-9]+)([A-Z]{2})");
Matcher m = p.matcher(s);
while (m.find())
{
System.out.printf("Interval: %5s, Day of Week: %s%n",
m.group(1), m.group(2));
}
Here's the equivalent C# code, in case you're interested:
string s = "FREQ=WEEKLY;WKST=MO;BYDAY=22TU,-2WE,+223FR";
Regex r = new Regex(@"(?:;BYDAY=|,)([+-]?[0-9]+)([A-Z]{2})");
foreach (Match m in r.Matches(s))
{
Console.WriteLine("Interval: {0,5}, Day of Week: {1}",
m.Groups[1], m.Groups[2]);
}
I'm a bit rusty, but I'll propose to "caveats". First of all, regexp(s) come in various dialects. There is a fantastic O'Reilly book about this, but there is a chance that your C# utility applies slightly different rules.
As an example, I used a similar (but different tool) and discovered that it did parse things differenty...
First of all it rejected your regexp (maybe a typo?) the initial "*" does not make sense, unless you put a dot (.) in front of it. Like this:
.*;BYDAY=(?:([+-]?[0-9]*)([A-Z]{2}),?)*.*
Now it was accepted, but it "matched" only the 2/WE part, and "skipped" the 2/TU pair.
(I suggest you read about greedy and non-greedy matching to understand this a bit better.
Therefore I updated your pattern as follows:
.*;BYDAY=(?:([+-]?[0-9]*)([A-Z]{2}),?),(?:([+-]?[0-9]*)([A-Z]{2}),?)*.*
And now it works and correctly captures 2,TU,2 and WE.
Maybe this helps?
精彩评论