Lookaround Regex engine
How does the regex engine work in terms of lookaround? my specific query is the following:
If I type ^(?!ABC)$
, will that look through an entire string for the substring ABC?
Secondly, how would I perform two operations in one regex? Say I wanted to find a string with an odd number of b
's and an even number of c
's?
EDIT: I only want to talk regarding regexes; I know this can be done in other ways.
This is what I'm using:
\b(?=[^A]*A([^A]*A[^A]*A)*[^开发者_Go百科A]*)([^C]*(C[^C]*C[^C]*)*[^C]*)\b
It fails on CC
but should only pull out strings with odd a
's and c
's.
It turns out that you can do this with a vanilla regular expression. It's just not pretty.
^((cc|bb)*((bc|cb)(bb|cc)*(bc|cb))*)*(b|c((cc|bb)*((bc|cb)(bb|cc)*(bc|cb))*)*b((cc|bb)*((bc|cb)(bb|cc)*(bc|cb))*)*c)((cc|bb)*((bc|cb)(bb|cc)*(bc|cb))*)*$
To understand the regular expression, draw a DFA with four states arranged in a square, linked forwards and backwards around the perimeter of the square. The horizontal links represent consuming a B, while the vertical links represent consuming a C. At the top left is the start state, representing having an even number of Cs and an even number of Bs. The top right is the accept state, reached by consuming a B. The bottom states are reached from the top states (and visa-versa) by consuming a C. Now, we can make any number of transitions that preserve the parity of our Cs and Bs, and we'll end up back at the start state. Then we consume a B, bringing us to our accept state. Then from there, so long as we maintain the parities, we're good. Two Cs maintain parity, as do two Bs. That's the (cc|bb)*
bit.
But you can also maintain parity by going to the opposite corner (via a C and a B in either order), doing as many BB/CC as you like, then returning to the corner you were in (again, either way). That's this bit: ((bc|cb)(bb|cc)*(bc|cb))*
So, we have ((cc|bb)*((bc|cb)(bb|cc)*(bc|cb))*)*
, being a set of transitions that gets us back where we started (call it a noop). You can make your odd B transition on either the top, in which case b
will do, or the bottom, in which case you need to get down to the bottom with a c
, do another noop, then have your b
, then another noop, then the c
back to the top.
I should mention that you can always take two regular expressions and generate a regular expression that matches only strings matched by both of the expressions.
^(?!ABC)$ will that look through an string for the substring ABC?
No, it alone matches the empty String only, it's zero-width negative lookahead. You can use it for things like "^(?!ABC)A..$"
matching ABD, ADC, but not ABC.
Secondly, how would I perform two operations in one regex? Say I wanted to find a string with an odd number of b's and an even number of c's?
Write the two patterns and put the first of them in a positive lookahead like (?=pat1)pat2
The regexp will match only empty strings.
Regexp is probably not the best option if you want to find the number of odd / even c's and b's.
The odd c pattern will look like: ^[^c]*c[^c]*(c[^c]*c)*[^c]*$
Not a very nice one, and if you want to add the even b pattern ^[^b]*(b[^b]*b)*[^b]*$
to it, the result will be all but completely unreadable.
精彩评论