Simplify this regular expression
I'm doing some pre-exam exercises for my compilers class, and needed to simplify this regular expression.
(a U b)*(a U e)b* U (a U b)*(b U e)a*
Quite obviously, the e is the empty string, and the U stands for union.
So far, I think one of the (a U b)* can be removed, as the union of a U a = a. However, I c开发者_JAVA百科an't find any other simplifications, and am not doing so well with the other problems thus far. :(
Any help is appreciated, thanks very much!
First translate to an english description of the language:
(a U b)*(a U e)b* U (a U b)*(b U e)a*
Translates to:
Any sequence of a
s or b
s, followed by an optional a
, followed by any number of b
s.
OR
Any number of a
s and b
s, followed by an optional b
, follwed by any number of a
s
There is a lot of overlap here - at least (a U b)*(a U e)
is exactly the same as (a U b)*
, because "Any sequence of a
s and b
s" necessarily either ends with an a
or epsilon (as any string can end with epsilon) so those groups can be eliminated, leaving
(a U b)*b* U (a U b)*a*
Translates to:
Any sequence of a
s or b
s, followed by any number of b
s.
OR
Any number of a
s and b
s, follwed by any number of a
s
Now the first section of those to outermost groups is the same, so lets collapse those into one
(a U b)*(a* U b*)
Translates to:
Any sequence of a
s or b
s, followed by any number of a
s OR by any number b
s.
now hold on a minute, "Any sequence of As and Bs" necessarily ends with "Any sequence of a
s OR any sequence of b
s", which means anything which matches the first part can match the whole regex (because the second part can have a length of zero) so why don't we just make it
(a U b)*
Ta Da. Simple.
Little rusty on regex, but if * still represents the "zero or more ocurrences" you can replace:
(a U e)b* for (a U b)*
which leaves the first part with:
(a U b)*(a U b)* = (a U b)*
On the right side, you have that
(b U e)a* = (b U a)*
Now, since a U b = b U a, you get:
(a U b)*(a U b)*
on the right hand side, which leaves just
(a U b)* U (a U b)* = (a U b)*
I think that's it...
I think the whole thing is equivalent to (a U b)*
(or in most regex grammars, (a|b)*
)
I´ll give you an idea of how I would solve it: (not very formal and no guarantee)
Look at the left side of the main U:
(a U b)* - What does it mean? A combination of a´s and b´s of length n, where n >= 0.
Next comes (a U e). What do we have here? An a or empty word. If we wanted that a we could just have gotten it in the previous part already. If we want the e, well we can leave it out anyway. Please note here that we dont have to take an a, because we have the option to chose e. So we can skip this whole part.
What is next? b*. What is that? As many b´s as we want. We could have gotten those in the first part also! we can leave that out!
So the only thing on the left is (a U b)*.
Lets have a look on the right side:
Ok this is easy now, we can use the same idea it is just different letters.
We will also get (a U b)* in the same way.
So in the end we have (a U b)* U (a U b)* which you know is equal to (a U b)*.
精彩评论