How do I parse recurring pattern with regex
I want to use regex to find unknown number of arguments in a string. I think that if I explain it would be hard so let's just see the example:
The regex: @ISNULL\('(.*?)','(.*?)','(.*?)'\)
@ISNULL('1','2','3')
The result:
Group[0] "@ISNULL('1','2','3')" at 0 - 20
Group[1] "1" at 9 - 10
Group[2] "2" at 13 - 14
Group[3] "3" at 17 - 18
That's working great. The problem begins when I need to find unknown number of arguments (2 and more).
开发者_Python百科What changes do I need to do to the regex in order to find all the arguments that will occur in the string?
So, if I parse this string "@ISNULL('1','2','3','4','5','6')"
I'll find all the arguments.
If you don't know the number of potential matches in a repeated construct, you need a regex engine that supports captures in addition to capturing groups. Only .NET and Perl 6 offer this currently.
In C#:
string pattern = @"@ISNULL\(('([^']*)',?)+\)";
string input = @"@ISNULL('1','2','3','4','5','6')";
Match match = Regex.Match(input, pattern);
if (match.Success) {
Console.WriteLine("Matched text: {0}", match.Value);
for (int ctr = 1; ctr < match.Groups.Count; ctr++) {
Console.WriteLine(" Group {0}: {1}", ctr, match.Groups[ctr].Value);
int captureCtr = 0;
foreach (Capture capture in match.Groups[ctr].Captures) {
Console.WriteLine(" Capture {0}: {1}",
captureCtr, capture.Value);
captureCtr++;
}
}
}
In other regex flavors, you have to do it in two steps. E.g., in Java (code snippets courtesy of RegexBuddy):
First, find the part of the string you need:
Pattern regex = Pattern.compile("@ISNULL\\(('([^']*)',?)+\\)");
// or, using non-capturing groups:
// Pattern regex = Pattern.compile("@ISNULL\\((?:'(?:[^']*)',?)+\\)");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
ResultString = regexMatcher.group();
}
Then use another regex to find and iterate over your matches:
List<String> matchList = new ArrayList<String>();
try {
Pattern regex = Pattern.compile("'([^']*)'");
Matcher regexMatcher = regex.matcher(ResultString);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group(1));
}
This answer is somewhat speculative as i have no clue what regex engine you are using. If the parameters are always numbers and always enclosed in single quotes, then why don't you try using the digit class like this:
'(\d)+?'
This is just the \d
class and the extraneous @ISNULL stuff removed, as i assume you are only interested in the parameters themselves. You may not need the +
and of course i don't know whether the engine you are using supports the lazy ?
operator, just give it a go.
精彩评论