Write recursive-descent parsing to parse epsilon(ε) in Java

2023-01-09 12:24 问答作者：

For example,

EBNF

A ::= B c;

B ::= T1 | T2 | ε

T1 ::= a

T2 ::= b

parseA()
{
switch(currentToken.kind){
case Token.a :
    parseT1();
case Token.b :
    parseT2();
break;

case <epsilon> :
    
break;
default:
    // report error
break;
}
}

How to write the pars开发者_JS百科er parse the epsilon ( set of empty string ) in Java?

Whether it's Java or any language, the point is you have a stream of tokens. You don't "get" a token and decide what to do. Rather you "look at" the next token, and if you can use it, you "accept" it.

See the difference?

Do not "get" and then "decide".
Do "look and decide" and then "accept".
This is the heart of the concept of "lookahead".

So, ParseA calls ParseB.

Parse B looks at the next token in the stream.
If it is "a" it accepts it and returns.
If it is "b" it accepts it and returns.
Otherwise it simply returns to ParseA (without accepting anything).

Then ParseA looks at the next token.
If it is "c" it accepts it and returns.
Otherwise it fails.

Make sense?

To do this, you should add a sentinel token to the end of the stream, that is never accepted. At the end, that should be the only token left in the stream. If not, you've got a syntax error consisting of extra garbage at the end.

epsilon is just a marker for "empty string allowed here", so you don't need to parse anything; the descent is finished. It seems unlikely that you would actually have a token for that though; you need to detect if there's no token available or if the next token can be used in another production.

For example, you might get the input c. You need to realize that this matches the production A ::= B c;, because B can be reduced to epsilon -- there is no epsilon token, you just need to realize that the B rule there is optional and in that case needs to be skipped over to reduce c to A

As it stands, this is a bit on the simplistic side to make much sense as a parser. The whole thing can be written as a simple regular expression: "[ab]?c". If you really insist on writing it as a parser, the code could be something like:

Boolean parseA() { 
    // an A is a B followed by a `c`:
    return parseB() && (get_token() == Token.c);
}

Boolean parseB  {
    // Get a token. 
    //     If it's an `a` or a `b`, consume it. 
    //     Otherwise, throw it back (match null string by consuming nothing).
    // Either way, return true (successful match).
    //
    token current_token = get_token();
    if (token != Token.a && token != Token.b)
        push_back(current_token);
    return true;
}

Edit (in response to comment on another answer): No, when you're matching B, you should not look for a Token.c. As far as B cares, there are three possibilities: match 'a', match 'b', or match nothing at all. It's then up to the part that parses A to check that it has a B followed by a Token.c.

For example, if you were to change the grammar to something like:

A ::= B C

B ::= a | b | ε

C ::= c | d

Since 'B' still has the same definition, you should not have to change it just because some other definition changed. Likewise, you might add to the grammar to allow (for example) an arbitrary string of B's followed by a C.

You 'parse' an epsilon when you see anything in the FOLLOW set of the current non-terminal. So since FOLLOW(B) = { 'c' }, you use case Token.c: in the switch in parseB for it

继续阅读：compiler-construction parsing recursive-descent

Write recursive-descent parsing to parse epsilon(ε) in Java

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？