RegEx Multiple Matches in Text
I am trying to parse out an email and most of its working except when the record in question comes in with multiple errors.
Here's part of the text
Record #1 with LeadRecordID 4 and MTN of (813) 555-1234 has 4 errors:
Shipping Street Address cannot be blank
Shipping City cannot be blank
Shipping Zipcode cannot be blank
Errors exist in secondary records #2, #3, #4, record not processed.
Record #2 with LeadRecordID 5 and MTN of (813) 555-4321 has 1 开发者_运维知识库errors:
Shipping Street Address cannot be blank
Here is the RegEx I'm using:
Record #(?<RecordNumber>\d*) with LeadRecordID (?<LeadRecordId>\d*) and MTN of .* has (?<NumberOfErrors>\d*) errors:(?:\r\n|)* (?<Error1>.*)
Edit: If I do this, I get two matches, with the Error Group only showing one match per group, it should be showing all error lines. Record #(?\d*) with LeadRecordID (?\d*) and MTN of .* has (?\d*) errors:(?:\r\n)(?.*)(?:\r\n)
Edit 2: This seems to get me a subgroup, thanks.
Record #(?<RecordNumber>\d*) with LeadRecordID (?<LeadRecordId>\d*) and MTN of [^\r\n]* has (?<NumberOfErrors>\d*) errors:(?:\r\n|)*(?<Errors>(?:(?<Error>\s{3}[^\r\n]+)(?:\r\n)*)+)
enter code here
Try to use this pattern and Regex.Matches:
@"Record #(?<RecordNumber>\d*) with LeadRecordID (?<LeadRecordId>\d*) and MTN of [^\r\n]* has (?<NumberOfErrors>\d*) errors:(?:\r\n|)*(?<Errors>(?:\s{3}[^\r\n]+(?:\r\n)*)+)"
Test code:
static void Main(string[] args)
{
string pattern =
@"Record #(?<RecordNumber>\d*) with LeadRecordID (?<LeadRecordId>\d*) and MTN of [^\r\n]* has (?<NumberOfErrors>\d*) errors:(?:\r\n|)*(?<Errors>(?:\s{3}[^\r\n]+(?:\r\n)*)+)";
string message = @"Record #1 with LeadRecordID 4 and MTN of (813) 555-1234 has 4 errors:
Shipping Street Address cannot be blank
Shipping City cannot be blank
Shipping Zipcode cannot be blank
Errors exist in secondary records #2, #3, #4, record not processed.
Record #2 with LeadRecordID 5 and MTN of (813) 555-4321 has 1 errors:
Shipping Street Address cannot be blank";
MatchCollection mc = Regex.Matches(message, pattern);
foreach (Match m in mc)
{
Console.WriteLine("RecordNumber = \"{0}\"", m.Groups["RecordNumber"].Value);
Console.WriteLine("LeadRecordId = \"{0}\"", m.Groups["LeadRecordId"].Value);
Console.WriteLine("NumberOfErrors = \"{0}\"", m.Groups["NumberOfErrors"].Value);
Console.WriteLine("Errors = \"{0}\"", m.Groups["Errors"].Value);
MatchCollection errors = Regex.Matches(m.Groups["Errors"].Value, @"\s{3}(?<error>[^\r\n]+)(?:\r\n)*");
foreach(Match g1 in errors)
{
Console.WriteLine(g1.Groups["error"].Value);
}
Console.WriteLine("------------------------");
}
Console.ReadLine();
}
Result:
RecordNumber = "1"
LeadRecordId = "4"
NumberOfErrors = "4"
Errors = " Shipping Street Address cannot be blank
Shipping City cannot be blank
Shipping Zipcode cannot be blank
Errors exist in secondary records #2, #3, #4, record not processed.
"
Shipping Street Address cannot be blank
Shipping City cannot be blank
Shipping Zipcode cannot be blank
Errors exist in secondary records #2, #3, #4, record not processed.
------------------------
RecordNumber = "2"
LeadRecordId = "5"
NumberOfErrors = "1"
Errors = " Shipping Street Address cannot be blank"
Shipping Street Address cannot be blank
------------------------
The acoolaum's answer is correct though it uses additional regular expression per match. I changed his code so that it uses only one regular expression. Here's the code:
static void Main(string[] args)
{
string pattern =
@"Record #(?<RecordNumber>\d*) with LeadRecordID (?<LeadRecordId>\d*) and MTN of [^\r\n]* has (?<NumberOfErrors>\d*) errors:\r\n(?:\s{3}(?<Error>[^\r\n]+)(?:\r\n)*)+";
string message =
@"Record #1 with LeadRecordID 4 and MTN of (813) 555-1234 has 4 errors:
Shipping Street Address cannot be blank
Shipping City cannot be blank
Shipping Zipcode cannot be blank
Errors exist in secondary records #2, #3, #4, record not processed.
Record #2 with LeadRecordID 5 and MTN of (813) 555-4321 has 1 errors:
Shipping Street Address cannot be blank";
MatchCollection mc = Regex.Matches(message, pattern);
foreach (Match m in mc)
{
Console.WriteLine("RecordNumber = \"{0}\"", m.Groups["RecordNumber"].Value);
Console.WriteLine("LeadRecordId = \"{0}\"", m.Groups["LeadRecordId"].Value);
Console.WriteLine("NumberOfErrors = \"{0}\"", m.Groups["NumberOfErrors"].Value);
Console.WriteLine("Errors:");
foreach (Capture capture in m.Groups["Error"].Captures)
{
Console.WriteLine("\t{0}", capture.Value);
}
Console.WriteLine("------------------------");
}
Console.ReadLine();
}
Please notice I changed regular expression itself with code to extract matches from Regex (I use Group.Captures property to extract multiple matches of group "Error").
Output:
RecordNumber = "1"
LeadRecordId = "4"
NumberOfErrors = "4"
Errors:
Shipping Street Address cannot be blank
Shipping City cannot be blank
Shipping Zipcode cannot be blank
Errors exist in secondary records #2, #3, #4, record not processed.
------------------------
RecordNumber = "2"
LeadRecordId = "5"
NumberOfErrors = "1"
Errors:
Shipping Street Address cannot be blank
------------------------
I can't tell without seeing your code, but you probably need
Regex.Matches
as opposed to Regex.Match
What do you set as option for matching. Single Line or multiline?
I think you need to change your regex and use multiline option.
精彩评论