Is it possible to have a grammar where a "keyword" can also be treated as a "non-keyword"?

2023-01-18 17:05 问答作者：

I have the following grammar in ANTLRWorks 1.4. I'm playing around with ideas for implementation of a parser in a text-adventure game creator, where the user will specify the various allowable commands for his game.

grammar test;

parse       :   cmd EOF;


cmd         :   putSyn1 gameObject inSyn1 gameObject;

putSyn1     :   Put | Place | Drop ;

inSyn1      :   In | Into | Within;


gameObject  :   det obj;

det         :   The | A | An | ;

obj          :  Word obj | Word;


Space       :       (' ' | '\t' | '\r' | '\n'){$channel=HIDDEN;};
Put         :   'put';
Place       :   'place';
Drop        :   'drop';
In          :   'in';
Into        :   'into';
Within      :   'within';
The         :   'the';
A           :   'a';
An          :   'an';

Word        :   ('a'..'z' | 'A'..'Z')+;

I'm just getting a feel for the various subtleties involved (like I did here).

This time, using ANTLR, I'm wondering if I can parse input such as:

put wood in fire place

That is, "wood" and "fire place" are the gameObjects above. However, "place" is also a synonym for "put". So this would be equally valid:

place wood in fire place

ANTLR gives me a NoViableAltException when t开发者_如何学JAVArying to parse the last "place" token. I want to recognize "fire place" as a gameObject.

So is this sort of thing possible in ANTLR? Is it possible in grammar?

On the side, I'm working on a manual implementation that uses a weird custom data structure with bits of NFA, Dictionary's and whatnot. But I still need more time and must sacrifice a few brain cells to design the required search & insertion algorithms.

But if this is possible in ANTLR, I could just use the generated C# file, yah?

Sure. PL/1 is famous for not having any reserved words, e.g., you can use keywords (e.g., IF) as a variable name anywhere it isn't needed as a keyword:

 IF  IF = 1  THEN  ELSE=3;  ELSE END=4;

Building a parser that does this is harder. You can't do this "simply" in the lexer, because it doesn't know the context in which identifier might be a keyword, or not.

There are several ways out. When an identifier like entity is found:

1) Make the lexer ask the parser, " do you want a keyword now? ". In that case, produce a keyword. Getting the parser to cooperate here might be hard. It may also be that the parser doesn't know, because it has to see more input to decide. Consider Fortran's famous format statement:

     FORMAT ( A1, I2, ... ) X

You can't tell when you see the word "FORMAT" if it is a keyword, or an identifier; you have to scan ahead arbitrarily far to inspect X. If X is anything but a end of statement, the FORMAT word is the name of an array identifier; if X is end-of-statment, its a FORMAT keyword and statement.

2) Emit both a keyword (if the identifier matches one) and the identifier, and make the parser try both. Most parsers won't handle this well, but GLR parsers can handle this with aplomb if designed reasonably. This handles the FORMAT problem trivially by pushing into the parser's lookahead capability. (ANTLR isn't GLR. Our DMS Software Reengineering Toolkit has exactly such a GLR parser, and we use this trick a lot).

3) Place all identifier-like things into a hash table. Use a recursive descent parser (ANTLR is one); when that parser wants a keyword, it simply inspects the identifier it got to verify it is the keyword it needs. If it doesn't want a keyword, it simply uses the identifier as an identifier. I don't know how to implement this trick with ANTLR since I don't use it. This won't handle the "can't decide without lookahead" case well.

I'd handle something like this with the lexer instead of the parser -- have the lexer do a "maximum munch", so it recognizes "fire place" as a single token, and only recognizes "place" as a separate token if it's not immediately preceded by "fire".

With that, the parser doesn't have to notice that the same sequence of characters in the input happen to form all or part of two entirely separate tokens.

继续阅读：antlr antlrworks grammar parsing

Is it possible to have a grammar where a "keyword" can also be treated as a "non-keyword"?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？