开发者

.NET Regular Expression - How to get a match count?

I have a program that highlights text. The terms that are highlighted are defined by our users. They can specify wildcards at the beginning or the end of a term by using the '*' character. In the end, the users are looking for us to also provide them with the number of hits for each term.

For simplicity's sake, let's assume I'm given just two terms: justice and just*. The program would run some regex that looks something like this:

{(?:nocapture^|[^\p{L}\p{N}']|\b)((justice)|(just[\S]*))(?:nocapture$|开发者_如何学Python[^\p{L}\p{N}']|\b)}

And lets assume that the block of text this user wants to highlight and get a count for is this:

This is justice!

While it correctly finds the word "justice", I only get a hit on the capture group for "justice". It doesn't match against the capture group with "just[\S]*".

So, is there any way to write the regular expression (or use .NET options) to force the engine to attempt to run a match against every capture group that is separated by ORs? Or will it always only use the left-most capture group when they are seperated by ORs?

Thanks!


It's always the first one in order of appearance if both would match. Of course if matching one pattern causes matching to fail at subsequent positions, the engine will backtrack and attempt matching the other patterns in the capture group.

If you think about it, when the engine sees a capture group with multiple patterns that match, it has to pick one of them somehow as the "tentative correct result" before attempting to match the rest of the expression. That somehow is "leftmost is preferred".


It's faster for the regex engine if it does not have to check each option in a group with ORs. As soon as one of the options in the groups is true (read left-to-right), the entire group is true.

This is just like if you have a conditional statement:

int num = 2;

// has to check both values
if(num == 1 || num == 2) { /* stuff */ }

// only has to check the first one, can skip over the second compare for speed
if(num == 2 || num == 3) { /* stuff */ }

// has to check both values
if(num == 3 || num == 4) { /* stuff */ }

So, to answer your question, as far as I know, no, there is no way. But why would you want to enforce it?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜