ANTLR: grammar for the R5RS lexical structure, problem with numbers

2023-03-11 10:12 问答作者：

i'm implementing an IDE for scheme in eclipse using DLTK. So far, i am programming the grammar to recognize the lexical structure.

i'm following the official EBNF which can be viewed here:

http://rose-r5rs.googlecode.com/hg/doc/r5rs-grammar.html

i can't get a simple form of the numbers grammar getting worked. for example the decimal numbers, i have

grammar r5rsnumbers;

options {
  language = Java;
}


program:
NUMBER;

// NUMBERS


NUMBER : /*NUM_2 | NUM_8 |*/ NUM_10; //| NUM_16;
fragment NUM_10 : PREFIX_10 COMPLEX_10;
fragment COMPLEX_10 
: REAL_10 (
            '@' REAL_10
            | '+' (
                    UREAL_10 'i'
                    | 'i'
                    )?  
            | '-' (
                    UREAL_10 'i'
                    | 'i'
                    )?
            )?
    | '+' (
        UREAL_10 'i'
        | 'i'
        )?  
    | '-' (
        UREAL_10 'i'
        | 'i'
        )?;

fragment REAL_10 : SIGN UREAL_10;
fragment UREAL_10 
    : UINTEGER_10 ('/' UINTEGER_10)?
    | DECIMAL_10;
fragment UINTEGER_10 : DIGIT_10+ '#'*;

fragment DECIMAL_10 
    : UINTEGER_10 SUFFIX
    | '.' DIGIT_10+ '#'* SUFFIX
    | DIGIT_10+ '.' DIGIT_10* '#'* SUFFIX
    | DIGIT_10+ '#'+ '.' '#'* SUFFIX;

fragment PREFIX_10 
    : RADIX_10  EXACTNESS
    | EXACTNESS RADIX_10;

fragment DIGIT : '0'..'9';
fragment EMPTY : '""'; // empty is the empty string
fragment SUFFIX : EMPTY | EXPONENT_MARKER SIGN DIGIT_10+;
fragment EXPONENT_MARKER : 'e' | 's' | 'f' | 'd' | 'l';
fragment SIGN : EMPTY | '+' |  '-';
fragment EXACTNESS : EMPTY | '#i' | '#e';
fragment RADIX_10 : EMPTY | '#d';
fragment DIGIT_10 : DIGIT;

the problem is, it is not recognizing anything. i don't understand the warning i get from the PREFIX_10 or how to solve it. if i don't use fragment in the rules, the file isn't compiling since he complains about the DIGIT_10 rule matching the same input as almost all other prior rules.

it's the same with num_2, 开发者_如何转开发num_8 and num_16

plus, i am not sure with my solution of the empty-string.

how do i get around here?

Note that your ANTLR rule:

EMPTY : '""';

does not match an empty string, but two double quotes.

But you don't want a lexer rule to match only an empty string: that will cause it to go in an infinite loop since there are an infinite amount of empty strings in any string/source.

So the BNF rules:

<real 10>
    ::= <sign> <ureal 10>

<sign>
    ::= <empty> | {+} | {-}

should not be translated as the following ANTLR rules:

REAL_10 
  :  SIGN UREAL_10
  ;

SIGN 
  :  EMPTY 
  |  '+' 
  |  '-'
  ;

but like this instead:

REAL_10 
  :  SIGN? UREAL_10
  ;

SIGN 
  :  '+' 
  |  '-'
  ;

Also note that your rule:

fragment COMPLEX_10 
: REAL_10 (
            '@' REAL_10
            | '+' (
                    UREAL_10 'i'
                    | 'i'
                    )?  
            | '-' (
                    UREAL_10 'i'
                    | 'i'
                    )?
            )?
    | '+' (
        UREAL_10 'i'
        | 'i'
        )?  
    | '-' (
        UREAL_10 'i'
        | 'i'
        )?;

is a bit hard to read. Indenting it differently might make this a bit easier to comprehend:

fragment COMPLEX_10
  :  REAL_10 ( '@' REAL_10 
             | '+' (UREAL_10 'i' | 'i')? 
             | '-' (UREAL_10 'i' | 'i')?
             )?
  |  '+' (UREAL_10 'i' | 'i')?  
  |  '-' (UREAL_10 'i' | 'i')?
  ;

which could be simplified by writing:

fragment COMPLEX_10
  :  REAL_10 ('@' REAL_10)?
  |  REAL_10? ('+' | '-') UREAL_10? 'i'
  ;

Also be aware that many BNF notations make no distinction between lower- and uppercase literals. So instead of writing 'i' in your ANTLR grammar, you might want to use ('i' | 'I') instead.

EDIT

Sebastian wrote:

but i'm still having problems with the PREFIX_10 rule: fragment PREFIX_10 : RADIX_10? EXACTNESS? | EXACTNESS? RADIX_10?; which tells me that alternative 2 can never be matched, although it should match #i #d and #d #i with the 2 alternatives seperately or am i doing something wrong here?

There are a couple of things wrong with the (fragment) rule PREFIX_10:

fragment PREFIX_10 
  :  RADIX_10? EXACTNESS? // alternative 1
  |  EXACTNESS? RADIX_10? // alternative 2
  ;

For one, both match an empty string. Because alternative 1 will always match an empty string, alternative 2 would never match, which is what ANTLR was telling you.

Now, looking at the BNF rules:

<exactness>
    ::= <empty> | {#i} | {#e}

<prefix 10>
    ::= <radix 10> <exactness>
      | <exactness> <radix 10>

<radix 10>
    ::= <empty> {#d}

(Note that <empty> {#d} equals {#d}, so the <empty> is IMO just misplaced. All other radii don't have and <empty> part)

I'd translate those into the following (untested!) ANTLR rules:

fragment EXACTNESS
  :  '#i' 
  |  '#e'
  ;

fragment PREFIX_10
  :  RADIX_10 EXACTNESS?
  |  EXACTNESS RADIX_10 // **
  ;

fragment RADIX_10
  :  '#d'
  ;

** Note that it's not:

fragment PREFIX_10
  :  RADIX_10 EXACTNESS? // matches '#d'
  |  EXACTNESS? RADIX_10 // matches '#d'
  ;

because the lexer does not know through which alternative to match #d.

And in case the BNF rule for <radix 10> should be like this (ie. they forgot to place a |):

<radix 10>
    ::= <empty> 
      | {#d}

then the ANTLR PREFIX_10 should still look like:

fragment PREFIX_10
  :  RADIX_10 EXACTNESS?
  |  EXACTNESS RADIX_10
  ;

but then all other rules that use PREFIX_10 should make PREFIX_10 optional.

HTH

继续阅读：antlr grammar numbers r5rs scheme

ANTLR: grammar for the R5RS lexical structure, problem with numbers

EDIT

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

EDIT

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？