ANTLRv3 not reading options

2023-03-05 22:03 问答作者：

I am very new to ANTLR and am trying to understand how the Lexer and Parser rules work. I'm experiencing issues with a grammar I've written that seem to be related to lexer tokens with multiple characters being seen as "matches" even when only the first few characters actually match. To demonstrate this, I have written a simple ANTLR 3 Grammar:

grammar test;
options {
    k=3;
}

@lexer::header { package test;}
@header {package test;}

sentence    :   (CHARACTER)*;

CHARACTER   :   'a'..'z'|' ';
SPECIAL     :   'special';

I'm using AntlrWorks to parse the following test input:

apple basic say sponsor speeds speckled specific wonder

The output I get 开发者_StackOverflow社区is:

apple basic say nsor ds led ic wonder

It seems to me that the LEXER is using k=1 and therefore matching my SPECIAL token with anything that includes the two letters 'sp'. Once it encounters the letters 'sp', it then matches sucessive characters within the SPECIAL literal until the actual input fails to match the expected token - at which point it throws an error (consuming that character) and then continues with the rest of the sentence. Each error is of the form:

  line 1:18 mismatched chracter 'o' expecting 'e'

However, this isn't the behaviour I'm trying to create. I wish to create a lexer token that matches the keyword ('special') - for use in other parser rules not included in this test example. However, I don't want other rules/input that just happens to include the same initial characters to be affected

To summarize:

How do I actually set antlr 3 options (such as k=2 or k=3 etc)? It seems to me, at least, that the options I'm trying to use here aren't being set.
Is there a better way to create parser or lexer rules to match a particular keyword in my input, without affecting processing of other parts of the input that don't contain a full match?

The k in the options { ... } section defines the look ahead of the parser, not the lexer.

Note that the grammar

CHARACTER   :   'a'..'z'|' ';
SPECIAL     :   'special';

is ambiguous: your 'special' could also be considered as 7 'a'..'z''s. Normally, it'd be lexed as follows:

grammar Test;

sentence : (special | word | space)+ EOF;
special  : SPECIAL;
word     : WORD;
space    : SPACE;

SPECIAL  : 'special';
WORD     : 'a'..'z'+;
SPACE    : ' ';

which will parse the input:

specia special specials

as follows:

ANTLRv3 not reading options

I.e. it gets (more or less) tokenized as a combination of LL(1) and "longest-matched". Sorry, I realize that's a bit vague, but the Definitive ANTLR Reference does not clarify this exactly (at least, I can't find it...). But I realize that this might not be what you're looking for.

AFAIK, the only way to produce single char-tokens and define keywords that are made up from these single char-tokens, is done by merging these two tokens in a single rule, and use predicates and manual look-ahead to see if they conform to a key word, and if not, change the type of the token in a "fall through" sub rule. A demo:

grammar test;

tokens {
  LETTER;
}

@lexer::members {
  // manual look ahead
  private boolean ahead(String text) {
    for(int i = 0; i < text.length(); i++) {
      if(input.LA(i+1) != text.charAt(i)) {
        return false;
      }
    }
    return true;
  }
}

sentence
  :  (t=. {System.out.printf("\%-7s :: '\%s'\n", tokenNames[$t.type], $t.text);})+ EOF
  ;

SPECIAL 
  :  {ahead("special")}?=> 'special'
  |  {ahead("keyword")}?=> 'keyword'
  |  'a'..'z' {$type = LETTER;} // Last option and no keyword is found: 
                                // change the type of this token
  ;

SPACE
  :  ' '
  ;

The parser generated from the above grammar can be tested with the class:

import org.antlr.runtime.*;

public class Main {
    public static void main(String[] args) throws Exception {
        ANTLRStringStream in = new ANTLRStringStream("apple basic special speckled keyword keywor");
        testLexer lexer = new testLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        testParser parser = new testParser(tokens);
        parser.sentence();
    }
}

As you can see, when parsing the input:

apple basic special speckled keyword keywor

the following output is generated:

LETTER  :: 'a'
LETTER  :: 'p'
LETTER  :: 'p'
LETTER  :: 'l'
LETTER  :: 'e'
SPACE   :: ' '
LETTER  :: 'b'
LETTER  :: 'a'
LETTER  :: 's'
LETTER  :: 'i'
LETTER  :: 'c'
SPACE   :: ' '
SPECIAL :: 'special'
SPACE   :: ' '
LETTER  :: 's'
LETTER  :: 'p'
LETTER  :: 'e'
LETTER  :: 'c'
LETTER  :: 'k'
LETTER  :: 'l'
LETTER  :: 'e'
LETTER  :: 'd'
SPACE   :: ' '
SPECIAL :: 'keyword'
SPACE   :: ' '
LETTER  :: 'k'
LETTER  :: 'e'
LETTER  :: 'y'
LETTER  :: 'w'
LETTER  :: 'o'
LETTER  :: 'r'

See the Q&A What is a 'semantic predicate' in ANTLR? to learn more about predicates in ANTLR.

继续阅读：antlr grammar

ANTLRv3 not reading options

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？