开发者

eliminate extra spaces in a given ANTLR grammar

In any grammar I create in ANTLR, is it possible to parse the grammar and the result of the parsing can elemi开发者_JAVA百科nate any extra spaces in the grammar. f.e

simple example ;

int x=5;

if I write

int x      =          5         ; 

I would like that the text changes to the int x=5 without the extra spaces. Can the parser return the original text without extra spaces?


Can the parser return the original text without extra spaces?

Yes, you need to define a lexer rule that captures these spaces and then skip() them:

Space
  :  (' ' | '\t') {skip();}
  ;

which will cause spaces and tabs to be ignored.

PS. I'm assuming you're using Java as the target language. The skip() can be different in other targets (Skip() for C#, for example). You may also want to include \r and \n chars in this rule.

EDIT

Let's say your language only consists of a couple of variable declarations. Assuming you know the basics of ANTLR, the following grammar should be easy to understand:

grammar T;

parse
  :  stat* EOF
  ;

stat
  :  Type Identifier '=' Int ';'
  ;

Type
  :  'int'
  |  'double'
  |  'boolean'
  ;

Identifier
  :  ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
  ;

Int
  :  '0'..'9'+
  ;

Space
  :  (' ' | '\t' | '\n' | 'r')+ {skip();}
  ; 

And you're parsing the source:

int x   =      5     ; double y     =5;boolean z      =    0  ;

which you'd like to change into:

int x=5;
double y=5;
boolean z=0;

Here's a way to embed code in your grammar and let the parser rules return custom objects (Strings, in this case):

grammar T;

parse returns [String str]
@init{StringBuilder buffer = new StringBuilder();}
@after{$str = buffer.toString();}
  :  (stat {buffer.append($stat.str).append('\n');})* EOF
  ;

stat returns [String str]
  :  Type Identifier '=' Int ';' 
     {$str = $Type.text + " " + $Identifier.text + "=" + $Int.text + ";";}
  ;

Type
  :  'int'
  |  'double'
  |  'boolean'
  ;

Identifier
  :  ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
  ;

Int
  :  '0'..'9'+
  ;

Space
  :  (' ' | '\t' | '\n' | 'r')+ {skip();}
  ; 

Test it with the following class:

import org.antlr.runtime.*;

public class Main {
    public static void main(String[] args) throws Exception {
        String source = "int x   =      5     ; double y     =5;boolean z      =    0  ;";
        ANTLRStringStream in = new ANTLRStringStream(source);
        TLexer lexer = new TLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        TParser parser = new TParser(tokens);
        System.out.println("Result:\n"+parser.parse());
    }
}

which produces:

Result:
int x=5;
double y=5;
boolean z=0;
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜