What's the reason this antlr grammar is not matching this input?

2023-02-12 07:41 问答作者：

I apologize in advance about asking a so similar question, but i'm rather frustrated, and i will probably be able to explain better on a new question.

I'm trying to rewrite parts of a structured file, and thought of using antlr. These files are lines of {X} tokens. There is a character i'm looking ahead for, so that i can rewrite parts of the file different if i find it. But this character ( '#' ), can occur in many parts of the file. However, if it appears on 4th {#} it determines if i need to rewrite part of the next {X} in a way, or in another way, or not at all (if there is nothing there).

Typical input:

{ 1 }{ Where to? # }{ Where to? }{ # }{ }{ G.Cabbie_Line = 1 }{ }{ }{ }{ }{ }{ }{ }

{ 2 }{ Just drive. }{ Just drive. }{ 0 }{ }{ npc.WorldMap( G.WorldMap_State ) }{ }{ }{ }{ }{ }{ }{ Not here. }

(I added the first # so you see it can be there.) My grammar, antlr 3.3 - it is warns "line 1:31 no viable alternative at input '{ # }'" and "line 2:35 no viable alternative at input '{ 0 }'"

grammar VampireDialog;

options
{
output=AST;
ASTLabelType=CommonTree;
language=Java;
} 
tokens
{
REWRITE;
}

@parser::header {
import java.util.LinkedList;
import java.io.File;
}

@members {
//the lookahead type i'm using ( ()=> ) wraps everything after in a if, including the actions ( {} )
//that i need to use to prepare the arguments for the replace rules. Declare them global.
    String condition, command, wrappedCommand; boolean isEmpty, alreadyProcessed;

    public static void main(String[] args) throws Exception {
        File vampireDir = new File(System.getProperty("user.home"), "Desktop/Vampire the Masquerade - Bloodlines/Vampire the Masquerade - Bloodlines/Vampire/dlg/dummy");
        
        List<File> files = new LinkedList<File>();
        getFiles(256, new File[]{vampireDir}, files, new LinkedList<File>());
        for (File f : files) {
            if (f.getName().endsWith(".dlg")) {
                System.out.println(f.getName());
                VampireDialogLexer lex = new VampireDialogLexer(new ANTLRFileStream(f.getAbsolutePath(), "Windows-1252"));
                TokenRewriteStream tokens = new TokenRewriteStream(lex);
                VampireDialogParser parser = new VampireDialogParser(tokens);
                    Tree t = (Tree) parser.dialog().getTree();
                    System.out.println(t.toStringTree());
            }
        }
    }

    public static void getFiles(int levels, File[] search, List<File> files, List<File> directories) {
        for (File f : search) {
            if (!f.exists()) {
                throw new AssertionError("Search file array has non-existing files");
            }
        }
        getFilesAux(levels, search, files, directories);
    }

    private static void getFilesAux(int levels, File[] startFiles, List<File> files, List<File> directories) {
        List<File[]> subFilesList = new ArrayList<File[]>(50);
        for (File f : startFiles) {
            File[] subFiles = f.listFiles();
            if (subFiles == null) {
                files.add(f);
            } else {
                directories.add(f);
                su开发者_运维技巧bFilesList.add(subFiles);
            }
        }

        if (levels > 0) {
            for (File[] subFiles : subFilesList) {
                getFilesAux(levels - 1, subFiles, files, directories);
            }
        }
    }
}




/*------------------------------------------------------------------
 * PARSER RULES
 *------------------------------------------------------------------*/
dialog : (ANY ANY ANY npc_or_pc ANY* NL*)*;

npc_or_pc : (ANY ANY) =>
         pc_marker  pc_condition
|        npc_marker npc_condition;


pc_marker  :  t=ANY {!t.getText().trim().isEmpty() && !t.getText().contains("#")}?;
npc_marker :  t=ANY {!t.getText().trim().isEmpty() &&  t.getText().contains("#")}?;

pc_condition : '{' condition_text '}'
   { 
     condition = $condition_text.tree.toStringTree();
     isEmpty = condition.trim().isEmpty();
     command = "npc.Count()";
     wrappedCommand  =  "("+condition+") and "+ command;
     alreadyProcessed = condition.endsWith(command);
   }
   -> {alreadyProcessed}?   '{' condition_text '}'
   -> {isEmpty}?            '{' REWRITE[command] '}'
   ->                       '{' REWRITE[wrappedCommand] '}';

npc_condition : '{' condition_text '}'
   { 
     condition = $condition_text.tree.toStringTree();
     isEmpty = condition.trim().isEmpty();
     command = "npc.Reset()";
     wrappedCommand  =  "("+condition+") and "+ command;
     alreadyProcessed = condition.endsWith(command);
   }
   -> {alreadyProcessed}?   '{' condition_text '}'
   -> {isEmpty}?            '{' REWRITE[command] '}'
   ->                       '{' REWRITE[wrappedCommand] '}';

marker_text :    TEXT;
condition_text : TEXT;


/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/
//in the parser ~('#') means: "match any token except the token that matches '#'" 
//and in lexer rules ~('#') means: "match any character except '#'"


TEXT : ~('{'|NL|'}')*;
ANY : '{' TEXT '}';
NL : ( '\r' | '\n'| '\u000C');

There are 3 things going wrong in your grammar:

Problems

#1

In your npc_or_pc rule:

npc_or_pc 
  :  (ANY ANY)=> pc_marker  pc_condition 
  |              npc_marker npc_condition
  ;

you shouldn't be looking ahead for ANY ANY, because that would satisfy both pc_marker and npc_marker. You should look ahead for pc_marker followed by ANY (or pc_condition).

#2

In both your pc_condition and npc_condition rules:

pc_condition 
  :  '{' condition_text '}'
  ;

npc_condition 
  :  '{' condition_text '}'
  ;

you're using the tokens { and } but the lexer will never create such tokens. As soon as the lexer sees a { it will always be followed by TEXT '}', so the only tokens that the lexer produces will be of type ANY and NL: those are the only tokens available for the parser, which brings us to problem 3:

3

In your rules marker_text and condition_text:

marker_text    : TEXT;
condition_text : TEXT;

you're using the token TEXT, which will never be a part of the token stream (see #2).

Solutions

#1

Change the look ahead to look for pc_marker instead:

npc_or_pc 
  :  (pc_marker ... )=> pc_marker  ...
  |                     npc_marker ...
  ;

#2

Remove both the pc_condition and npc_condition rules and replace them by ANY tokens:

npc_or_pc 
  :  (pc_marker ANY)=> pc_marker  ANY
  |                    npc_marker ANY
  ;

#3

Remove both the marker_text and condition_text rules, you don't need them anymore since you removed pc_condition and npc_condition already.

Demo

Here's your modified grammar:

grammar VampireDialog;

dialog 
  :  (line {System.out.print($line.text);})* EOF
  ;

line
  :  ANY ANY ANY npc_or_pc ANY* NL+
  ;

npc_or_pc 
  :  (pc_marker ANY)=> pc_marker  ANY {System.out.print("PC  :: ");}
  |                    npc_marker ANY {System.out.print("NPC :: ");}
  ;


pc_marker  
  :  t=ANY {!t.getText().trim().isEmpty() && !t.getText().contains("#")}?
  ;

npc_marker 
  :  t=ANY {!t.getText().trim().isEmpty() &&  t.getText().contains("#")}?
  ;

TEXT : ~('{'|NL|'}')*;
ANY  : '{' TEXT '}';
NL   : ( '\r' | '\n'| '\u000C');

or even the slightly shorter equivalent:

grammar VampireDialog;

dialog 
  :  (line {System.out.print($line.text);})* EOF
  ;

line
  :  ANY ANY ANY npc_or_pc ANY+ NL+
  ;

npc_or_pc 
  :  (pc_marker ANY)=> pc_marker {System.out.print("PC  :: ");}
  |                    ANY       {System.out.print("NPC :: ");}
  ;

pc_marker  
  :  t=ANY {!t.getText().trim().isEmpty() && !t.getText().contains("#")}?
  ;

ANY  : '{' ~('{'|NL|'}')* '}';
NL   : ( '\r' | '\n'| '\u000C');

which can be tested with:

import org.antlr.runtime.*;

public class Main {
    public static void main(String[] args) throws Exception {
        String source = 
                "{ 1 }{ Where to? }{ Where to? }{ # }{ }{ G.Cabbie_Line = 1 }{ }{ }{ }{ }{ }{ }{ }\n" + 
                "{ 2 }{ Just drive. }{ Just drive. }{ 0 }{ }{ npc.WorldMap( G.WorldMap_State ) }{ }{ }{ }{ }{ }{ }{ Not here. }\n";
        ANTLRStringStream in = new ANTLRStringStream(source);
        VampireDialogLexer lexer = new VampireDialogLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        VampireDialogParser parser = new VampireDialogParser(tokens);
        parser.dialog();
    }
}

which will print the following to the console:

NPC :: { 1 }{ Where to? }{ Where to? }{ # }{ }{ G.Cabbie_Line = 1 }{ }{ }{ }{ }{ }{ }{ }
PC  :: { 2 }{ Just drive. }{ Just drive. }{ 0 }{ }{ npc.WorldMap( G.WorldMap_State ) }{ }{ }{ }{ }{ }{ }{ Not here. }

继续阅读：antlr

What's the reason this antlr grammar is not matching this input?

Problems

#1

#2

3

Solutions

#1

#2

#3

Demo

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Problems

#1

#2

3

Solutions

#1

#2

#3

Demo

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？