Regex that defines a regular language with {a,b} without a substring with exactly 3 b's (bbb)
Pretty much what the question says. I came up with
(ba)?(a + bb + bbbbb + aba)*(ab)?
Is there anything more readable? Or is this incorrect? I know you shouldn't really be doing this sorta thing with Regex when you can just go !~/bbb/ in your code, but it's a theory exercise.
Thanks.
Edit for Clarification: I'm not using |
to represent the OR bit in the Regex and using +
it instead. Sorry for the confusion.
Edit 2: {a,b}开发者_如何转开发
is for a language with just 'a' and 'b' characters. Not {mininum, maximum}. Sorry again.
Edit 3: Because this is part of a theory class, we're just dealing with the basics of Regex. The only things you're allowed to use are +, ?, () and *. You cannot use {minimum, maximum).
I think I have a working regex. Let b°
—which is a notation I invented just now—be the regex that matches zero or more b's, except it won't match three of them. This can be replaced by (ε | b | bb | bbbb+)
, so don't worry that I'm using magic or anything. Now I think that matching strings can be seen as repeating subpatterns of zero or more a's followed by b°
, which could be (a*b°)*
, but you need there to be at least one "a" in between sequences of b's. So your final regex is a*b°(a+b°)*
.
Since b°
can match the empty string, the initial a*
is superfluous as the a+
can pick up the initial a's just fine, so the regex can be optimized down to b°(a+b°)*
(thanks, wrikken).
Hmm, something like this?
^(a|(?<!b)b{1,2}(?!b)|b{4,})*$
edit:
Edit 3: Because this is part of a theory class, we're just dealing with the basics of Regex. The only things you're allowed to use are +, ?, () and *. You cannot use {minimum, maximum).
Pfff, talking about tying your hands behind your back... Simple solution: you cannot do it (^
& $
are requirements for it ever to work), and we need the |
. So, come up with a better conditions. Dropping the lookbehind & lookahead could be done, but isn't going to be pretty (at least, not without violating DRY):
^(b|bb|bbbb+)?(a+(b|bb|bbbb+)?)*$
You're matching a string without precisely 3 b's in a row. That means you're looking at substrings like "aa", "aba", "abba", and "abbbbb*a", where any of the exterior a's could be the beginning or end of the string, can be overlapped, and can be multiple. This suggests something like:
(a + ab + abb + abbbbb*)*
with appropriate additions to account for the missing a at the beginning of the string. There's a lot of repetitions, but that's how regular expressions work in basic form.
精彩评论