开发者

Simplify this regular expression

I'm doing some pre-exam exercises for my compilers class, and needed to simplify this regular expression.

(a U b)*(a U e)b* U (a U b)*(b U e)a*

Quite obviously, the e is the empty string, and the U stands for union.

So far, I think one of the (a U b)* can be removed, as the union of a U a = a. However, I c开发者_JAVA百科an't find any other simplifications, and am not doing so well with the other problems thus far. :(

Any help is appreciated, thanks very much!


First translate to an english description of the language:

(a U b)*(a U e)b* U (a U b)*(b U e)a*

Translates to:


Any sequence of as or bs, followed by an optional a, followed by any number of bs.

OR

Any number of as and bs, followed by an optional b, follwed by any number of as


There is a lot of overlap here - at least (a U b)*(a U e) is exactly the same as (a U b)*, because "Any sequence of as and bs" necessarily either ends with an a or epsilon (as any string can end with epsilon) so those groups can be eliminated, leaving

(a U b)*b* U (a U b)*a*

Translates to:


Any sequence of as or bs, followed by any number of bs.

OR

Any number of as and bs, follwed by any number of as


Now the first section of those to outermost groups is the same, so lets collapse those into one

(a U b)*(a* U b*)

Translates to:


Any sequence of as or bs, followed by any number of as OR by any number bs.


now hold on a minute, "Any sequence of As and Bs" necessarily ends with "Any sequence of as OR any sequence of bs", which means anything which matches the first part can match the whole regex (because the second part can have a length of zero) so why don't we just make it

(a U b)*

Ta Da. Simple.


Little rusty on regex, but if * still represents the "zero or more ocurrences" you can replace:

(a U e)b* for (a U b)*

which leaves the first part with:

(a U b)*(a U b)* = (a U b)*

On the right side, you have that

(b U e)a* = (b U a)*

Now, since a U b = b U a, you get:

(a U b)*(a U b)*

on the right hand side, which leaves just

(a U b)* U (a U b)* = (a U b)*

I think that's it...


I think the whole thing is equivalent to (a U b)* (or in most regex grammars, (a|b)*)


I´ll give you an idea of how I would solve it: (not very formal and no guarantee)

Look at the left side of the main U:

(a U b)* - What does it mean? A combination of a´s and b´s of length n, where n >= 0.

Next comes (a U e). What do we have here? An a or empty word. If we wanted that a we could just have gotten it in the previous part already. If we want the e, well we can leave it out anyway. Please note here that we dont have to take an a, because we have the option to chose e. So we can skip this whole part.

What is next? b*. What is that? As many b´s as we want. We could have gotten those in the first part also! we can leave that out!

So the only thing on the left is (a U b)*.

Lets have a look on the right side:

Ok this is easy now, we can use the same idea it is just different letters.

We will also get (a U b)* in the same way.

So in the end we have (a U b)* U (a U b)* which you know is equal to (a U b)*.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜