How can I use lookbehind in a C# Regex in order to skip matches of repeated prefix patterns?
How can I use lookbehind in a C# Regex in order to skip matches of repeated prefix patterns?
Example - I'm trying to have the expression match all the b
characters following any number of a
characters:
Regex expression = new Regex("(?<=a).*");
foreach (Match result in expression.Matches("aaabbbb"))
MessageBox.Show(result.Value);
returns aabbbb
, the lookbehind matching only an a
. How can I make it so that it wou开发者_如何学Gold match all the a
s in the beginning?
I've tried
Regex expression = new Regex("(?<=a+).*");
and
Regex expression = new Regex("(?<=a)+.*");
with no results...
What I'm expecting is bbbb
.
Are you looking for a repeated capturing group?
(.)\1*
This will return two matches.
Given:
aaabbbb
This will result in:
aaa
bbbb
This:
(?<=(.))(?!\1).*
Uses the above principal, first checking that the finding the previous character, capturing it into a back reference, and then asserting that that character is not the next character.
That matches:
bbbb
I figured it out eventually:
Regex expression = new Regex("(?<=a+)[^a]+");
foreach (Match result in expression.Matches(@"aaabbbb"))
MessageBox.Show(result.Value);
I must not allow the a
s to me matched by the non-lookbehind group. This way, the expression will only match those b
repetitions that follow a
repetitions.
Matching aaabbbb
yields bbbb
and matching aaabbbbcccbbbbaaaaaabbzzabbb
results in bbbbcccbbbb
, bbzz
and bbb
.
The reason the look-behind is skipping the "a" is because it is consuming the first "a" (but no capturing it), then it captures the rest.
Would this pattern work for you instead? New pattern: \ba+(.+)\b
It uses a word boundary \b
to anchor either ends of the word. It matches at least one "a" followed by the rest of the characters till the word boundary ends. The remaining characters are captured in a group so you can reference them easily.
string pattern = @"\ba+(.+)\b";
foreach (Match m in Regex.Matches("aaabbbb", pattern))
{
Console.WriteLine("Match: " + m.Value);
Console.WriteLine("Group capture: " + m.Groups[1].Value);
}
UPDATE: If you want to skip the first occurrence of any duplicated letters, then match the rest of the string, you could do this:
string pattern = @"\b(.)(\1)*(?<Content>.+)\b";
foreach (Match m in Regex.Matches("aaabbbb", pattern))
{
Console.WriteLine("Match: " + m.Value);
Console.WriteLine("Group capture: " + m.Groups["Content"].Value);
}
精彩评论