Is it possible to have a grammar where a "keyword" can also be treated as a "non-keyword"?
I have the following grammar in ANTLRWorks 1.4. I'm playing around with ideas for implementation of a parser in a text-adventure game creator, where the user will specify the various allowable commands for his game.
grammar test;
parse : cmd EOF;
cmd : putSyn1 gameObject inSyn1 gameObject;
putSyn1 : Put | Place | Drop ;
inSyn1 : In | Into | Within;
gameObject : det obj;
det : The | A | An | ;
obj : Word obj | Word;
Space : (' ' | '\t' | '\r' | '\n'){$channel=HIDDEN;};
Put : 'put';
Place : 'place';
Drop : 'drop';
In : 'in';
Into : 'into';
Within : 'within';
The : 'the';
A : 'a';
An : 'an';
Word : ('a'..'z' | 'A'..'Z')+;
I'm just getting a feel for the various subtleties involved (like I did here).
This time, using ANTLR, I'm wondering if I can parse input such as:
put wood in fire place
That is, "wood" and "fire place" are the gameObjects above. However, "place" is also a synonym for "put". So this would be equally valid:
place wood in fire place
ANTLR gives me a NoViableAltException when t开发者_如何学JAVArying to parse the last "place" token. I want to recognize "fire place" as a gameObject.
So is this sort of thing possible in ANTLR? Is it possible in grammar?
On the side, I'm working on a manual implementation that uses a weird custom data structure with bits of NFA, Dictionary's and whatnot. But I still need more time and must sacrifice a few brain cells to design the required search & insertion algorithms.
But if this is possible in ANTLR, I could just use the generated C# file, yah?
Sure. PL/1 is famous for not having any reserved words, e.g., you can use keywords (e.g., IF) as a variable name anywhere it isn't needed as a keyword:
IF IF = 1 THEN ELSE=3; ELSE END=4;
Building a parser that does this is harder. You can't do this "simply" in the lexer, because it doesn't know the context in which identifier might be a keyword, or not.
There are several ways out. When an identifier like entity is found:
1) Make the lexer ask the parser, " do you want a keyword now? ". In that case, produce a keyword. Getting the parser to cooperate here might be hard. It may also be that the parser doesn't know, because it has to see more input to decide. Consider Fortran's famous format statement:
FORMAT ( A1, I2, ... ) X
You can't tell when you see the word "FORMAT" if it is a keyword, or an identifier; you have to scan ahead arbitrarily far to inspect X. If X is anything but a end of statement, the FORMAT word is the name of an array identifier; if X is end-of-statment, its a FORMAT keyword and statement.
2) Emit both a keyword (if the identifier matches one) and the identifier, and make the parser try both. Most parsers won't handle this well, but GLR parsers can handle this with aplomb if designed reasonably. This handles the FORMAT problem trivially by pushing into the parser's lookahead capability. (ANTLR isn't GLR. Our DMS Software Reengineering Toolkit has exactly such a GLR parser, and we use this trick a lot).
3) Place all identifier-like things into a hash table. Use a recursive descent parser (ANTLR is one); when that parser wants a keyword, it simply inspects the identifier it got to verify it is the keyword it needs. If it doesn't want a keyword, it simply uses the identifier as an identifier. I don't know how to implement this trick with ANTLR since I don't use it. This won't handle the "can't decide without lookahead" case well.
I'd handle something like this with the lexer instead of the parser -- have the lexer do a "maximum munch", so it recognizes "fire place" as a single token, and only recognizes "place" as a separate token if it's not immediately preceded by "fire".
With that, the parser doesn't have to notice that the same sequence of characters in the input happen to form all or part of two entirely separate tokens.
精彩评论