开发者

Are there such a thing as LL(0) parsers?

I saw a question somewhere asking the difference between LL(0) and LR(0) parsers. Is there such a thing as LL(0) par开发者_运维问答sers? If so, how do they parse without looking at any token?


LL(0) parsers do look at the tokens, but they don't decide which productions to apply upon them. They just determine if the sequence belongs to the language or not. This means that every non-terminal symbol must have a single right-hand side and that there may be no recursion.

G == ID name lastname
name == STRING
lastname == STRING

# lexer rules
# -----------
ID == [0-9]+
STRING == <unicode>+

Note that, as mentioned by @280Z28, a separate lexer is needed to deal with the variable length parts (ID and STRING), or the grammar will not be LL(0).

The sequence of productions to apply to parse an input with that grammar requires zero lookahead.

A lookahead is required for determinism when there's more than one production that could be applied after part of the given input sequence has been parsed.

In theory, a grammar generates a language, and, in that, ambiguity (having more than one way to derive a given phrase) is fine. In parsing, having one-and-only-one way is in the way of semantics (meaning), and it is what we want.

In parsing of programming languages, a lookahead is the information required to know which grammar production to use next.

In LL(0) languages, there are no choices, so the input sequence is either accepted and parsed, or rejected.


The k in LR(k) refers to the number of lookahead tokens. You always use at least one token in order to determine the action to perform. The Wikipedia page page has some more information on this.

Intuitively, the extra lookahead symbols let you make reduction choices with more information, so they allow larger classes of grammars to be expressed without conflicts.


When I took compilers, we never talked about them, though we did talk about LL(1). There's no mention of them on Wikipedia.

An LL(0) parser would mean that the parser could make a decision without knowing the next token in the stream. I would expect that if languages with that property exist, they're pretty darn rare.


You don't usually hear about LL(0) parsing, for the reason given in the other answers: nontrivial parsing requires seeing some input. However, parts of an LL(1) parser can indeed run as an LL(0) parser.

For example, here is a simple BNF grammar that only requires lookahead in one production:

S -> A
A -> B
B -> 'a' | 'b'

The B production has two right-hand-sides, corresponding to two separate strings in the input, 'a' and 'b'. So the parser has to see the input to choose the proper RHS.

However, neither the S nor the A production have any choices to be made. So, while they do in fact have associated FIRST sets (containing both 'a' and 'b'), their FIRST sets are not needed to make a parsing decision, meaning that the S -> A and A -> B products are an LL(0) subgrammar. So an optimization is to ignore FIRST sets for these two nonterminals.

To make this clear, suppose the input is the string 'b'. Then the parser can automatically generate the top-down derivations S -> A a and A -> B before even looking at the input (this is called an automaton). Then, for the innermost derivation, for B, it must look at the input to decide which B production to use to complete the parse tree.

A plus for this kind of optimization is that error detection (finding no input, or an input other than 'a' or 'b') can be done right at the point at which the input is examined, rather than when parsing some other production. If desired, then, the error message can reference not only the B production, but the A and S productions as well, since they have already been generated. If the FIRST set of S were examined without the optimization, then the error would have to be reported when all that is known is that S -> A is being attempted, which is much less context information for the user.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜