开发者

Excluding certain elements from a specified set in Parsing Expressive Grammar (PEG.js)?

I am writing a lexer for Haskell using JavaScript and Parsing Expression Grammar, the implementation I use being PEG.js.

I have a problem with making it work for reserved words, as demonstrated in a simplified form here:

program = ( word / " " )+  
word = ( reserved / id )  
id = ( "a" / "b" )+  
reserved = ( "aa" )

The point here is to get a series of tokens that are either arbitrary sequences of a:s and/or b:s or the sequence "aa", and they are separated by spaces.

What I really get is either that every token that is not a space is recognized as id or that a token that should be recognised as id has all initial pairs of a:s eaten up as reserved, e.g.

"aab" gets recognized as reserved "aa" followed by id "b".

The way the Haskell lexical specification solves this ambiguity is to specify id like 开发者_Python百科this:

id = ( "a" / "b" )+[BUT NOT reserved]

I have tried replicating this using various combinations of the PEG ! and & -operators to acheive the same effect, but have not found a way to get this to work properly.

The solution:

id = !reserved ( "a" / "b" )+

that I've seen suggested in several places does not work.

Is this a limitation in the particular PEG-implementation, PEG in itself or (hopefully) my methods?

Thanks in advance!


!reserved ident is a perfectly acceptable technique in any PEG implementation, and PEG.js seems to support it as well. Btw, you should add !id after the definition of reserved.


As far as I know, PEG rules are positional. That basically means that rules are tried deterministically from the first to the last one. That said, you just need to put the "reserved" rule before declaring the "identifier" one.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜