开发者

How do I find the language from a regular expression?

How would I find the language for the following regular expressions over the alphabet {a, b}?

aUb*
(ab*Uc)
ab*Ubc*
a*bc*Uac

EDIT: Before i get downvoted like crazy, i'd appreciate it if someone could show me the steps towards solving these problems, not just the solution. Maybe even walking me through one so I can do the rest on my own.

Thanks!开发者_如何学Go


Edit: short answer, * means "zero or more repetitions" in almost all regex/grammar syntaxes include perl5 and RFC 5234. * typically binds more tightly than concatenation and alternation.

You say you want a language over the alphabet (a, b), but include c and U in your expressions. I'm going to assume that you want a language grammar over the alphabet (a, b, c) in a form like BNF given a regular expression where U is a low-precedence union operator, similar to | in perl re.

In that case,

a|b*

is equivalent to the BNF:

L := a
   | <M>
M := epsilon
   | b <M>

The * operator means zero or more, so the first clause in M is the base case, and the second clause is a recursive use of M that includes a terminal b.

L is just either a single terminal a or the nonterminal M.

(ab*|c)
L ::= a <M>
    | c
M ::= epsilon
    | b <M>

M is derived the same way as above, and the rest is self explanatory.

ab*|bc*
L ::= a <M>
    | b <N>
M ::= epsilon
    | b <M>
N ::= epsilon
    | c <N>

N is derived in the same way as M above.

a*bc*|ac

The * in most regular expression languages binds more tightly than concatenation, so this is the same as

(a*b(c*))|(ac)

which boils down to

L ::= <M> b <N>
    | a c
M ::= epsilon
    | a <M>
N ::= epsilon
    | c <N>

In general to convert a regex to BNF, you simply use adjacency in a regex to mean adjaceny in BNF, and U or | in a regex means | in BNF.

If you define a nonterminal <X> ::= x then you can handle x* thus:

L ::= epsilon
    | <X> <L>

With the same nonterminal <X> ::= x then you can handle x+ thus:

L ::= <X>
    | <L> <X>

That gets you the repetition operators * and + which leaves ?. x? is simply

L ::= epsilon
    | <X>


Although Mike gave grammars generating the languages denoted by the regular expressions, your assignment requests the languages themselves. Because you're dealing with regular expressions, your answers must be regular sets.

Recall the definition of regular sets over an alphabet:

Let Σ be an alphabet. The class of regular sets over Σ is the smallest class
containing ∅, {λ}, and {a}, for all a ∈ Σ, and closed under union, concatenation,
and Kleene star.

Now recall the definition of regular expressions over an alphabet:

Let Σ be an alphabet. The class of regular expressions over Σ is the smallest
class containing ∅, λ, and a, for all a ∈ Σ, and closed under union, concat-
enation, and Kleene star.

The translation, therefore, should be straightforward. In fact, it consists of inserting curly brackets around each letter! For example:

a ∪ b*  denotes {a} ∪ {b}*
ab* ∪ c denotes {a}{b}* ∪ {c}
...

If you want to express the language of each regular expression in set-builder notation, you can revert to the definitions of the operations over regular sets. Recall:

Let A and B be regular sets. Then
   1  A ∪ B = {x : x ∈ A ∨ x ∈ B}
   2. AB    = {xy : x ∈ A ∧ y ∈ B}
   3. A*    = ∪[i = 0 -> ∞]A^i

The regular sets can be translated into set builder notation by substitution of the definitions of the regular set operations. To avoid introducing nested set-builder notation, I've used equality in conjunction with the definition of concatenation to express the concatenation of regular sets.

{a} ∪ {b}*    = {w : w ∈ {a} ∨ w ∈ ∪[i = 0 -> ∞]{b}^i}
{a}{b}* ∪ {c} = {w : (w = xy ∧ (x ∈ {a} ∧ y ∈ ∪[i = 0 -> ∞]{b}^i)) ∨ w ∈ {c}}
...

You should now be able to find the languages denoted by the remaining expressions without difficulty.


If you know what star, union and concatenation mean, these should be easy. The first is a union b star. According to order of operations, this means a union (b star). Union means anything in the language on the left or anything in the language on the right. a means the language consisting of the length-one string a; b star means the language consisting of any string which consists of zero or more b symbols and nothing else. So this language is {empty, a, b, bb, bbb, bbbb, ...}. In the second one, ab* means all strings consisting of an a followed by zero or more b symbols. Do star first, then concatenation, then union, obeying the given explicit parentheses.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜