开发者

Groups in a C# regular expression

I'm using the following tester to try and figure out this regex: http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx

My input: 123stringA 456 stringB

My pattern: ([0-9]{3})(.*?)

The pattern will eventually be a date but for this question's sake, I'll keep it simple and use my simplified input.

The way I understand this pattern, it's开发者_如何学编程 "give me 3 numbers [0-9]{3}, followed by any number of characters of any kind .*, until it reaches the next match ?

What I want/expect out of this test is 2 matches with 2 groups each:

Match 1

   Group 1 - 123

   Group 2 - stringA

Match2

   Group 1 - 456

   Group 2 - stringB

For some reason, the tester at the link I provided sees that there is a second group, but it's coming up blank. I have done this with PHP before and it seemed to work as I described, but in C# I'm seeing different results. Any help you can provide would be appreciated.

I should also note that this could expand multiple lines...

  • EDIT *

Here's the actual input: 2011-08-09 09:25:57,069 [9] Orchard.Environment.Extensions.ExtensionManager - Error loading extension 2011-08-09 09:25:57,493 [8] Orchard.Environment.Extensions.ExtensionManager

For match 1 I'm wanting to get: 2011-08-09 09:25:57 and ,069 [9] Orchard.Environment.Extensions.ExtensionManager - Error loading extension

and for match 2: 2011-08-09 09:25:57 and ,493 [8] Orchard.Environment.Extensions.ExtensionManager

I'm trying to find a good way to parse an error log file that's in one giant text file and maintain the date the error happened and the details that went along with it


The first group matches 3 digits and the second group matches the remainder of the string because there's nothing in the pattern to prevent the .*? from not matching the remainder of the string.

CORRECTION: The second group matches an empty string because there's nothing in the pattern to prevent the .*? from not matching an empty string.


.* means match anything zero or more times. ? Mean to find the minimal number of times, so it chooses zero matches as the minimum.

Try this pattern, ([0-9]{3})([a-zA-Z]*)


According to your comment, this is what you want to match

2011-08-09 09:25:57,069 [9] Orchard.Environment.Extensions.ExtensionManager - Error loading extension 2011-08-09 09:25:57,493 [8] Orchard.Environment.Extensions.ExtensionManager - Error loading extension

This expression will match the Date in the first capturing group and the rest till the next date OR till the end of the string in the second capturing group.

(\d{4}(?:-\d{2}){2})(.*?)(?=(?:\d{4}(?:-\d{2}){2}|$))

See it here on Regexr


Not sure why the tool gives you that, but you can switch to this alternative pattern that works in .Net

([0-9]{3})([^0-9]*)

http://regexhero.net/tester/?id=155b8e2b-b851-46b9-8a84-b82f8d6963a1

Explanation:

In your previous pattern, the nongreedy version was matching 0 characters.

In the new one, [^0-9] says match any character other than the range 0-9 (note the negation ^ specifier).

Update: Given the actual input string (in comments), the pattern changes to (its a guess assuming what the OP wants to do:

,([0-9]{3})([^\n]*)

http://regexhero.net/tester/?id=155b8e2b-b851-46b9-8a84-b82f8d6963a1

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜