What does this regular expression mean?
I have 开发者_开发百科got a question on regular expression. Though simple, I got some contracting answers from my professor. I just wanted to clarify it here.
(a+bc)* - What are the 4 smallest distinct pattern which this regex can give ?
I was expecting it to be epsilon (empty string), abc, aabc, aaabc.
But , his explanation was (a+bc) results in either a or bc. So his answer was epsilon (empty string), a, bc && aa(because of the star)
Which one is correct ? Is there any link which explains these kind of regex. I checked out wikipedia but they dont have these kind of things. Could you point me to some resource which actaully deals with the above kind ? Thanks in advance !
It sounds like your professor confused +
for |
.
For (a+bc)*
, the answer would be ε, abc, aabc, aaabc as you said, while for (a|bc)*
, the answer would be ε, a, aa, bc as he said.
You're correct, your professor is wrong (assuming there wasn't a misunderstanding between the two of you).
Note that there isn't one single regex language (there is a common definition for a Regular Language, but they aren't the same thing), though many share common features, including those used in your example. It's conceivable that someone could have regexes where '+' means alternation, but typically '+' is "one or more of the preceding" and '|' is for alternation.
As for a regular expression resource, check Regular-Expressions.info. It lists the features of various regular expression implementations. Each implementation often has their own page (such as perlre), which may have more or better information.
I think for regexes '+' and '|' means same thing in reg expression. Only the context make the difference especially the Kleene star.
eg
(a)* +(bc)* means -- ε, a, aa, bc
but both (a+bc)* and (a|bc) means same thing as - ε, a, aa, abc etc (converting to NFA will clear the doubt. Here in NFA you have 2 alternativies either a or bc but the *means you can go back using an ε and choose whichever path you want.)
eg from wiki page of RE Examples:
a|b* denotes {ε, "a", "b", "bb", "bbb", ...} (a|b)* denotes the set of all strings with no symbols other than "a" and "b", including the empty string: {ε, "a", "b", "aa", "ab", "ba", "bb", "aaa", ...} ab*(c|ε) denotes the set of strings starting with "a", then zero or more "b"s and finally optionally a "c": {"a", "ac", "ab", "abc", "abb", "abbc", ...}
精彩评论