What's the reason this antlr grammar is not matching this input?
I apologize in advance about asking a so similar question, but i'm rather frustrated, and i will probably be able to explain better on a new question.
I'm trying to rewrite parts of a structured file, and thought of using antlr. These files are lines of {X} tokens. There is a character i'm looking ahead for, so that i can rewrite parts of the file different if i find it. But this character ( '#' ), can occur in many parts of the file. However, if it appears on 4th {#} it determines if i need to rewrite part of the next {X} in a way, or in another way, or not at all (if there is nothing there).
Typical input:
{ 1 }{ Where to? # }{ Where to? }{ # }{ }{ G.Cabbie_Line = 1 }{ }{ }{ }{ }{ }{ }{ }
{ 2 }{ Just drive. }{ Just drive. }{ 0 }{ }{ npc.WorldMap( G.WorldMap_State ) }{ }{ }{ }{ }{ }{ }{ Not here. }
(I added the first # so you see it can be there.) My grammar, antlr 3.3 - it is warns "line 1:31 no viable alternative at input '{ # }'" and "line 2:35 no viable alternative at input '{ 0 }'"
grammar VampireDialog;
options
{
output=AST;
ASTLabelType=CommonTree;
language=Java;
}
tokens
{
REWRITE;
}
@parser::header {
import java.util.LinkedList;
import java.io.File;
}
@members {
//the lookahead type i'm using ( ()=> ) wraps everything after in a if, including the actions ( {} )
//that i need to use to prepare the arguments for the replace rules. Declare them global.
String condition, command, wrappedCommand; boolean isEmpty, alreadyProcessed;
public static void main(String[] args) throws Exception {
File vampireDir = new File(System.getProperty("user.home"), "Desktop/Vampire the Masquerade - Bloodlines/Vampire the Masquerade - Bloodlines/Vampire/dlg/dummy");
List<File> files = new LinkedList<File>();
getFiles(256, new File[]{vampireDir}, files, new LinkedList<File>());
for (File f : files) {
if (f.getName().endsWith(".dlg")) {
System.out.println(f.getName());
VampireDialogLexer lex = new VampireDialogLexer(new ANTLRFileStream(f.getAbsolutePath(), "Windows-1252"));
TokenRewriteStream tokens = new TokenRewriteStream(lex);
VampireDialogParser parser = new VampireDialogParser(tokens);
Tree t = (Tree) parser.dialog().getTree();
System.out.println(t.toStringTree());
}
}
}
public static void getFiles(int levels, File[] search, List<File> files, List<File> directories) {
for (File f : search) {
if (!f.exists()) {
throw new AssertionError("Search file array has non-existing files");
}
}
getFilesAux(levels, search, files, directories);
}
private static void getFilesAux(int levels, File[] startFiles, List<File> files, List<File> directories) {
List<File[]> subFilesList = new ArrayList<File[]>(50);
for (File f : startFiles) {
File[] subFiles = f.listFiles();
if (subFiles == null) {
files.add(f);
} else {
directories.add(f);
su开发者_运维技巧bFilesList.add(subFiles);
}
}
if (levels > 0) {
for (File[] subFiles : subFilesList) {
getFilesAux(levels - 1, subFiles, files, directories);
}
}
}
}
/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/
dialog : (ANY ANY ANY npc_or_pc ANY* NL*)*;
npc_or_pc : (ANY ANY) =>
pc_marker pc_condition
| npc_marker npc_condition;
pc_marker : t=ANY {!t.getText().trim().isEmpty() && !t.getText().contains("#")}?;
npc_marker : t=ANY {!t.getText().trim().isEmpty() && t.getText().contains("#")}?;
pc_condition : '{' condition_text '}'
{
condition = $condition_text.tree.toStringTree();
isEmpty = condition.trim().isEmpty();
command = "npc.Count()";
wrappedCommand = "("+condition+") and "+ command;
alreadyProcessed = condition.endsWith(command);
}
-> {alreadyProcessed}? '{' condition_text '}'
-> {isEmpty}? '{' REWRITE[command] '}'
-> '{' REWRITE[wrappedCommand] '}';
npc_condition : '{' condition_text '}'
{
condition = $condition_text.tree.toStringTree();
isEmpty = condition.trim().isEmpty();
command = "npc.Reset()";
wrappedCommand = "("+condition+") and "+ command;
alreadyProcessed = condition.endsWith(command);
}
-> {alreadyProcessed}? '{' condition_text '}'
-> {isEmpty}? '{' REWRITE[command] '}'
-> '{' REWRITE[wrappedCommand] '}';
marker_text : TEXT;
condition_text : TEXT;
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
//in the parser ~('#') means: "match any token except the token that matches '#'"
//and in lexer rules ~('#') means: "match any character except '#'"
TEXT : ~('{'|NL|'}')*;
ANY : '{' TEXT '}';
NL : ( '\r' | '\n'| '\u000C');
There are 3 things going wrong in your grammar:
Problems
#1
In your npc_or_pc
rule:
npc_or_pc
: (ANY ANY)=> pc_marker pc_condition
| npc_marker npc_condition
;
you shouldn't be looking ahead for ANY ANY
, because that would satisfy both pc_marker
and npc_marker
. You should look ahead for pc_marker
followed by ANY
(or pc_condition
).
#2
In both your pc_condition
and npc_condition
rules:
pc_condition
: '{' condition_text '}'
;
npc_condition
: '{' condition_text '}'
;
you're using the tokens {
and }
but the lexer will never create such tokens. As soon as the lexer sees a {
it will always be followed by TEXT '}'
, so the only tokens that the lexer produces will be of type ANY
and NL
: those are the only tokens available for the parser, which brings us to problem 3:
3
In your rules marker_text
and condition_text
:
marker_text : TEXT;
condition_text : TEXT;
you're using the token TEXT
, which will never be a part of the token stream (see #2).
Solutions
#1
Change the look ahead to look for pc_marker
instead:
npc_or_pc
: (pc_marker ... )=> pc_marker ...
| npc_marker ...
;
#2
Remove both the pc_condition
and npc_condition
rules and replace them by ANY
tokens:
npc_or_pc
: (pc_marker ANY)=> pc_marker ANY
| npc_marker ANY
;
#3
Remove both the marker_text
and condition_text
rules, you don't need them anymore since you removed pc_condition
and npc_condition
already.
Demo
Here's your modified grammar:
grammar VampireDialog;
dialog
: (line {System.out.print($line.text);})* EOF
;
line
: ANY ANY ANY npc_or_pc ANY* NL+
;
npc_or_pc
: (pc_marker ANY)=> pc_marker ANY {System.out.print("PC :: ");}
| npc_marker ANY {System.out.print("NPC :: ");}
;
pc_marker
: t=ANY {!t.getText().trim().isEmpty() && !t.getText().contains("#")}?
;
npc_marker
: t=ANY {!t.getText().trim().isEmpty() && t.getText().contains("#")}?
;
TEXT : ~('{'|NL|'}')*;
ANY : '{' TEXT '}';
NL : ( '\r' | '\n'| '\u000C');
or even the slightly shorter equivalent:
grammar VampireDialog;
dialog
: (line {System.out.print($line.text);})* EOF
;
line
: ANY ANY ANY npc_or_pc ANY+ NL+
;
npc_or_pc
: (pc_marker ANY)=> pc_marker {System.out.print("PC :: ");}
| ANY {System.out.print("NPC :: ");}
;
pc_marker
: t=ANY {!t.getText().trim().isEmpty() && !t.getText().contains("#")}?
;
ANY : '{' ~('{'|NL|'}')* '}';
NL : ( '\r' | '\n'| '\u000C');
which can be tested with:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
String source =
"{ 1 }{ Where to? }{ Where to? }{ # }{ }{ G.Cabbie_Line = 1 }{ }{ }{ }{ }{ }{ }{ }\n" +
"{ 2 }{ Just drive. }{ Just drive. }{ 0 }{ }{ npc.WorldMap( G.WorldMap_State ) }{ }{ }{ }{ }{ }{ }{ Not here. }\n";
ANTLRStringStream in = new ANTLRStringStream(source);
VampireDialogLexer lexer = new VampireDialogLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
VampireDialogParser parser = new VampireDialogParser(tokens);
parser.dialog();
}
}
which will print the following to the console:
NPC :: { 1 }{ Where to? }{ Where to? }{ # }{ }{ G.Cabbie_Line = 1 }{ }{ }{ }{ }{ }{ }{ }
PC :: { 2 }{ Just drive. }{ Just drive. }{ 0 }{ }{ npc.WorldMap( G.WorldMap_State ) }{ }{ }{ }{ }{ }{ }{ Not here. }
精彩评论