开发者

Lazy regex doesn't work as expected C#

I have the following regex: a?\W*?b and I have a string ,.! ,b

When searching for a match I get ,.! ,b, but not just b as I expect. Why is that? How to modify the regex to ge开发者_JAVA技巧t what I need?

Thank you for your help.


A lazy quantifier doesn't help here for what you want. Let's see what's happening.

The regex engine starts at the beginning of the string. First tries to match a. It can't, but it's no problem since the a is optional.

Then, there is a lazy \W*? so the regex engine skips it but remembers the current position.

It then tries to match b. It can't, so it backtracks and successfully matches the , with \W*?. It then goes on to try and match b (because of the lazy quantifier). It still can't and backtracks again. This repeats a few times until finally the regex engine has arrived at the b. Now the match is complete - the regex engine declares success.

So the regex works as specified - just not as intended. Now the question is: What exactly do you want the regex to do?

For example, if what you really want is:

Match b alone, unless it's preceded by a and some non-word characters, in which case match everything from a to b, then use

b|a\W*b


A lazy expression is only lazy from the right, i.e. it will be as short as possible by removing characters on the right, but it will not remove characters on the left.

To make the match start later, you need a greedy expression before it that swallows the characters that you don't want to match.

Alternatively, as Tim showed, you can make the match start later by only matching the first character and the following separators if the first character exists.


For example, the following might work: (a\W*)?b

To know better what might solve your problem, you should include more examples.


Your regexp matches the entire string like this:

  1. a, zero or one repetitions ("" in this case)
  2. Any character that is not alphanumeric, any number of repetitions, as few as possible (",.! ," in this case)
  3. b

In your case the regexp matches the entire string, and will therefor not find just the b (it doesn't find several matches of the same part).

If you search in a string like ',.! ,db' it will find the b.


The a? says "i want either zero or one instance of a" - this is satisfied as there is zero instances, and followed by

\W* says "i want zero or more non word characters", which is satisfied by the punctuation and space characters, and finally

b says "match a letter b", which it does. So your whole string satisfies the regex.

It helps if you give more examples of possible inputs before anyone sugests a possible solution.


Your example doesn't show why the a? is part of your regex but to match only b in a string that looks like ,.! ,b you can use lookbehind like this (?=\W*?)b.

This matches b that is preceded by a character that is a "non-word character" zero and unlmited times (as few as possible)

If you only want to match say a and b in a string such as a,.! ,b you'll have to use capturing groups: (a?)\W*?(b) where group one will hold the a if present and group 2 b


It's a mistake to speak of a regex as being greedy or non-greedy. You can use non-greedy quantifiers throughout the regex, but it will still try to start matching at the earliest opportunity, as you discovered. Similarly, a regex that uses only greedy quantifiers isn't guaranteed to return the longest possible match. For example,

Regex.Match("foo bar", @"\w+ (?:b|bar)")

...returns foo b, because alternation settles for the first alternative that works, even if a later one would result in a longer match. (Note that I'm talking about Perl-derived regex flavors like .NET's; some flavors, like awk and egrep, do indeed hold out for the longest possible match. But, since those flavors don't have non-greedy quantifiers, greedy isn't just the default mode, it's the only mode.)

In short, there's no such thing as a greedy or non-greedy regex, only greedy or non-greedy quantifiers.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜