Groups in a C# regular expression
I'm using the following tester to try and figure out this regex: http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx
My input: 123stringA 456 stringB
My pattern: ([0-9]{3})(.*?)
The pattern will eventually be a date but for this question's sake, I'll keep it simple and use my simplified input.
The way I understand this pattern, it's开发者_如何学编程 "give me 3 numbers [0-9]{3}, followed by any number of characters of any kind .*, until it reaches the next match ?
What I want/expect out of this test is 2 matches with 2 groups each:
Match 1 Group 1 - 123 Group 2 - stringA Match2 Group 1 - 456 Group 2 - stringBFor some reason, the tester at the link I provided sees that there is a second group, but it's coming up blank. I have done this with PHP before and it seemed to work as I described, but in C# I'm seeing different results. Any help you can provide would be appreciated.
I should also note that this could expand multiple lines...
- EDIT *
Here's the actual input: 2011-08-09 09:25:57,069 [9] Orchard.Environment.Extensions.ExtensionManager - Error loading extension 2011-08-09 09:25:57,493 [8] Orchard.Environment.Extensions.ExtensionManager
For match 1 I'm wanting to get: 2011-08-09 09:25:57 and ,069 [9] Orchard.Environment.Extensions.ExtensionManager - Error loading extension
and for match 2: 2011-08-09 09:25:57 and ,493 [8] Orchard.Environment.Extensions.ExtensionManager
I'm trying to find a good way to parse an error log file that's in one giant text file and maintain the date the error happened and the details that went along with it
The first group matches 3 digits and the second group matches the remainder of the string because there's nothing in the pattern to prevent the .*?
from not matching the remainder of the string.
CORRECTION: The second group matches an empty string because there's nothing in the pattern to prevent the .*?
from not matching an empty string.
.*
means match anything zero or more times. ?
Mean to find the minimal number of times, so it chooses zero matches as the minimum.
Try this pattern, ([0-9]{3})([a-zA-Z]*)
According to your comment, this is what you want to match
2011-08-09 09:25:57,069 [9] Orchard.Environment.Extensions.ExtensionManager - Error loading extension 2011-08-09 09:25:57,493 [8] Orchard.Environment.Extensions.ExtensionManager - Error loading extension
This expression will match the Date in the first capturing group and the rest till the next date OR till the end of the string in the second capturing group.
(\d{4}(?:-\d{2}){2})(.*?)(?=(?:\d{4}(?:-\d{2}){2}|$))
See it here on Regexr
Not sure why the tool gives you that, but you can switch to this alternative pattern that works in .Net
([0-9]{3})([^0-9]*)
http://regexhero.net/tester/?id=155b8e2b-b851-46b9-8a84-b82f8d6963a1
Explanation:
In your previous pattern, the nongreedy version was matching 0 characters.
In the new one, [^0-9]
says match any character other than the range 0-9
(note the negation ^
specifier).
Update: Given the actual input string (in comments), the pattern changes to (its a guess assuming what the OP wants to do:
,([0-9]{3})([^\n]*)
http://regexhero.net/tester/?id=155b8e2b-b851-46b9-8a84-b82f8d6963a1
精彩评论