Writing correct LL(1) grammars?

2023-02-11 16:43 问答作者：

I'm currently trying to write a (very) small interpreter/compiler for a programming language. I have set the syntax for the language, and I now need to write down the grammar for the language. I intend to use an LL(1) parser because, after a bit of research, it seems that it is the easiest to use.

I am new to this domain, but from what I gathered, formalising the syntax using BNF or EBNF is highly recommended. However, it seems that not all grammars are suitable for implementation using an LL(1) parser. Therefore, I was wondering what was the correct (or recommended) approach to writing grammars in LL(1) form.

Thank you for your help, Charlie.

PS: I intend to write the parser using Haskell's Parsec library.

EDIT: Also, according to SK-logic, Parsec can handle an infinite lo开发者_开发百科okahead (LL(k) ?) - but I guess the question still stands for that type of grammar.

I'm not an expert on this as I have only made a similar small project with an LR(0) parser. The general approach I would recommend:

Get the arithmetics working. By this, make rules and derivations for +, -, /, * etc and be sure that the parser produces a working abstract syntax tree. Test and evaluate the tree on different input to ensure that it does the arithmetic correctly. Make things step by step. If you encounter any conflict, resolve it first before moving on.
Get simper constructs working like if-then-else or case expressions working.
Going further depends more on the language you're writing the grammar for.

Definetly check out other programming language grammars as an reference (unfortunately I did not find in 1 min any full LL grammar for any language online, but LR grammars should be useful as an reference too). For example:

ANSI C grammar

Python grammar

and of course some small examples in Wikipedia about LL grammars Wikipedia LL Parser that you probably have already checked out.

I hope you find some of this stuff useful

There are algorithms both for determining if a grammar is LL(k). Parser generators implement them. There are also heuristics for converting a grammar to LL(k), if possible.

But you don't need to restrict your simple language to LL(1), because most modern parser generators (JavaCC, ANTLR, Pyparsing, and others) can handle any k in LL(k).

More importantly, it is very likely that the syntax you consider best for your language requires a k between 2 and 4, because several common programming constructs do.

So first off, you don't necessarily want your grammar to be LL(1). It makes writing a parser simpler and potentially offers better performance, but it does mean that you're language will likely end up more verbose than commonly used languages (which generally aren't LL(1)).

If that's ok, your next step is to mentally step through the grammar, imagine all possibilities that can appear at that point, and check if they can be distinguished by their first token.

There's two main rules of thumb to making a grammar LL(1)

If of multiple choices can appear at a given point and they can start with the same token, add a keyword in front telling you which choice was taken.
If you have an optional or repeated part, make sure it is followed by an ending token which can't appear as the first token of the optional/repeated part.
Avoid optional parts at the beginning of a production wherever possible. It makes the first two steps a lot easier.

继续阅读：grammar parsing recursive-descent

Writing correct LL(1) grammars?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？