lexer/parser ambiguity

2022-12-27 07:03 问答作者：

How does a lexer solve this ambiguity?

/*/*/

How is it that it doesn't just say, oh yeah, that's the begining of a multi-line comment, followed by another multi-line comment.

Wouldn't a greedy lexer just return the following tokens?

I'm in the midst of writing 开发者_JS百科a shift-reduce parser for CSS and yet this simple comment thing is in my way. You can read this question if you wan't some more background information.

UPDATE

Sorry for leaving this out in the first place. I'm planning to add extensions to the CSS language in this form /* @ func ( args, ... ) */ but I don't want to confuse an editor which understands CSS but not this extension comment of mine. That's why the lexer just can't ignore comments.

One way to do it is for the lexer to enter a different internal state on encountering the first /*. For example, flex calls these "start conditions" (matching C-style comments is one of the examples on that page).

The simplest way would probably be to lex the comment as one single token - that is, don't emit a "START COMMENT" token, but instead continue reading in input until you can emit a "COMMENT BLOCK" token that includes the entire /*(anything)*/ bit.

Since comments are not relevant to the actual parsing of executable code, it's fine for them to basically be stripped out by the lexer (or at least, clumped into a single token). You don't care about token matches within a comment.

In most languages, this is not ambiguous: the first slash and asterix are consumed to produce the "start of multi-line comment" token. It is followed by a slash which is plain "content" within the comment and finally the last two characters are the "end of multi-line comment" token.

Since the first 2 characters are consumed, the first asterix cannot also be used to produce an end of comment token. I just noted that it could produce a second "start of comment" token... oops, that could be a problem, depending on the amount of context is available for the parser.

I speak here of tokens, assuming a parser-level handling of the comments. But the same applies to a lexer, whereby the underlying rule is to start with '/*' and then not stop till '*/' is found. Effectively, a lexer-level handling of the whole comment wouldn't be confused by the second "start of comment".

Since CSS does not support nested comments, your example would typically parse into a single token, COMMENT. That is, the lexer would see /* as a start-comment marker and then consume everything up to and including a */ sequence.

Use the regexp's algorithm, search from the beginning of the string working way back to the current location.

if (chars[currentLocation] == '/' and chars[currentLocation - 1] == '*') {
  for (int i = currentLocation - 2; i >= 0; i --) {
    if (chars[i] == '/' && chars[i + 1] == '*') {
      // .......
    }
  }
}

It's like applying the regexp /\*([^\*]|\*[^\/])\*/ greedy and bottom-up.

One way to solve this would be to have your lexer return:

/
*
/
*
/

And have your parser deal with it from there. That's what I'd probably do for most programming languages, as the /'s and *'s can also be used for multiplication and other such things, which are all too complicated for the lexer to worry about. The lexer should really just be returning elementary symbols.

If what the token is starts to depend too much on context, what you're looking for may very well be a simpler token.

That being said, CSS is not a programming language so /'s and *'s can't be overloaded. Really afaik they can't be used for anything else other than comments. So I'd be very tempted to just pass the whole thing as a comment token unless you have a good reason not to: /\*.*\*/

继续阅读：lexer

lexer/parser ambiguity

UPDATE

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

UPDATE

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？