开发者

What does this regular expression mean?

I have 开发者_开发百科got a question on regular expression. Though simple, I got some contracting answers from my professor. I just wanted to clarify it here.

(a+bc)* - What are the 4 smallest distinct pattern which this regex can give ?

I was expecting it to be epsilon (empty string), abc, aabc, aaabc.

But , his explanation was (a+bc) results in either a or bc. So his answer was epsilon (empty string), a, bc && aa(because of the star)

Which one is correct ? Is there any link which explains these kind of regex. I checked out wikipedia but they dont have these kind of things. Could you point me to some resource which actaully deals with the above kind ? Thanks in advance !


It sounds like your professor confused + for |.

For (a+bc)*, the answer would be ε, abc, aabc, aaabc as you said, while for (a|bc)*, the answer would be ε, a, aa, bc as he said.


You're correct, your professor is wrong (assuming there wasn't a misunderstanding between the two of you).

Note that there isn't one single regex language (there is a common definition for a Regular Language, but they aren't the same thing), though many share common features, including those used in your example. It's conceivable that someone could have regexes where '+' means alternation, but typically '+' is "one or more of the preceding" and '|' is for alternation.

As for a regular expression resource, check Regular-Expressions.info. It lists the features of various regular expression implementations. Each implementation often has their own page (such as perlre), which may have more or better information.


I think for regexes '+' and '|' means same thing in reg expression. Only the context make the difference especially the Kleene star.

eg

(a)* +(bc)* means -- ε, a, aa, bc

but both (a+bc)* and (a|bc) means same thing as - ε, a, aa, abc etc (converting to NFA will clear the doubt. Here in NFA you have 2 alternativies either a or bc but the *means you can go back using an ε and choose whichever path you want.)

eg from wiki page of RE Examples:

a|b* denotes {ε, "a", "b", "bb", "bbb", ...} (a|b)* denotes the set of all strings with no symbols other than "a" and "b", including the empty string: {ε, "a", "b", "aa", "ab", "ba", "bb", "aaa", ...} ab*(c|ε) denotes the set of strings starting with "a", then zero or more "b"s and finally optionally a "c": {"a", "ac", "ab", "abc", "abb", "abbc", ...}

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜