开发者

Lookaround Regex engine

How does the regex engine work in terms of lookaround? my specific query is the following:

If I type ^(?!ABC)$, will that look through an entire string for the substring ABC?

Secondly, how would I perform two operations in one regex? Say I wanted to find a string with an odd number of b's and an even number of c's?

EDIT: I only want to talk regarding regexes; I know this can be done in other ways.

This is what I'm using:

\b(?=[^A]*A([^A]*A[^A]*A)*[^开发者_Go百科A]*)([^C]*(C[^C]*C[^C]*)*[^C]*)\b

It fails on CC but should only pull out strings with odd a's and c's.


It turns out that you can do this with a vanilla regular expression. It's just not pretty.

^((cc|bb)*((bc|cb)(bb|cc)*(bc|cb))*)*(b|c((cc|bb)*((bc|cb)(bb|cc)*(bc|cb))*)*b((cc|bb)*((bc|cb)(bb|cc)*(bc|cb))*)*c)((cc|bb)*((bc|cb)(bb|cc)*(bc|cb))*)*$

To understand the regular expression, draw a DFA with four states arranged in a square, linked forwards and backwards around the perimeter of the square. The horizontal links represent consuming a B, while the vertical links represent consuming a C. At the top left is the start state, representing having an even number of Cs and an even number of Bs. The top right is the accept state, reached by consuming a B. The bottom states are reached from the top states (and visa-versa) by consuming a C. Now, we can make any number of transitions that preserve the parity of our Cs and Bs, and we'll end up back at the start state. Then we consume a B, bringing us to our accept state. Then from there, so long as we maintain the parities, we're good. Two Cs maintain parity, as do two Bs. That's the (cc|bb)* bit.

But you can also maintain parity by going to the opposite corner (via a C and a B in either order), doing as many BB/CC as you like, then returning to the corner you were in (again, either way). That's this bit: ((bc|cb)(bb|cc)*(bc|cb))*

So, we have ((cc|bb)*((bc|cb)(bb|cc)*(bc|cb))*)*, being a set of transitions that gets us back where we started (call it a noop). You can make your odd B transition on either the top, in which case b will do, or the bottom, in which case you need to get down to the bottom with a c, do another noop, then have your b, then another noop, then the c back to the top.

I should mention that you can always take two regular expressions and generate a regular expression that matches only strings matched by both of the expressions.


^(?!ABC)$ will that look through an string for the substring ABC?

No, it alone matches the empty String only, it's zero-width negative lookahead. You can use it for things like "^(?!ABC)A..$" matching ABD, ADC, but not ABC.

Secondly, how would I perform two operations in one regex? Say I wanted to find a string with an odd number of b's and an even number of c's?

Write the two patterns and put the first of them in a positive lookahead like (?=pat1)pat2


The regexp will match only empty strings.

Regexp is probably not the best option if you want to find the number of odd / even c's and b's.


The odd c pattern will look like: ^[^c]*c[^c]*(c[^c]*c)*[^c]*$ Not a very nice one, and if you want to add the even b pattern ^[^b]*(b[^b]*b)*[^b]*$ to it, the result will be all but completely unreadable.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜