开发者

What are the differences between PEGs and CFGs?

From this wikipedia page:

The fundamental difference between context-free grammars and parsing expression grammars is that the PEG's choice operator is ordered. If the first alternative succeeds, the second alternative is ignored. Thus ordered choice is not commutative, unlike unordered choice as in context-free grammars and regular expressions. Ordered choice is analogous to soft cut operators available in some logic programming languages.

Why does PEG's choice operator short circuits the matching? Is it because to minimize memory usage (due to memoization)?

I'm not sure what the choice operator is in regular expressions but let's suppose it is this: /[aeiou]/ to match a vowel. So this regex is commu开发者_如何学编程tative because I could have written it in any of the 5! (five factorial) permutations of the vowel characters? i.e. /[aeiou]/ behaves the same as /[eiaou]/. What is the advantage of it being commutative? (c.f. PEG's non-commutativity)

The consequence is that if a CFG is transliterated directly to a PEG, any ambiguity in the former is resolved by deterministically picking one parse tree from the possible parses. By carefully choosing the order in which the grammar alternatives are specified, a programmer has a great deal of control over which parse tree is selected.

Is this saying that PEG's grammar is superior to CFG's?


A CFG grammar is non-deterministic, meaning that some input could result in two or more possible parse-trees. Though most CFG-based parser-generators have restrictions on the determinability of the grammar. It will give a warning or error if it has two or more choices.

A PEG grammar is deterministic, meaning that any input can only be parsed one way.

To take a classic example; The grammar

if_statement := "if" "(" expr ")" statement "else" statement
              | "if" "(" expr ")" statement;

applied to the input

if (x1) if (x2) y1 else y2

could either be parsed as

if_statement(x1, if_statement(x2, y1, y2))

or

if_statement(x1, if_statement(x2, y1), y2)

A CFG-parser would generate a Shift/Reduce-conflict, since it can't decide if it should shift (read another token), or reduce (complete the node), when reaching the "else" keyword. Of course, there are ways to get around this problem.

A PEG-parser would always pick the first choice.

Which one is better is for you to decide. My opinion is that often PEG-grammars is easier to write, and CFG grammars easier to analyze.


I think you're confusing CFG with LR and with ambiguity. Grammars are not deterministic/nondeterministic, though their parsers may be. An ambiguous grammar is still CFG if it complies with the definition, and a deterministic parser can be built for it doing what PEG does.


PEGs and CFGs are two different ways of specifying a language. If you write a parser by hand, chances are very good that you will write a so-called recursive descent parser. A recursive descent parser will automatically resolve any ambiguities in your grammar, but does so silently and likely not in the way you would have wanted. The problem with this is that you never find out that there were ambiguities that got automatically resolved, unless you thoroughly test your parser. PEGs are basically a formalization of recursive descent parsers, and so come with this problem. For examples of this problem see How does backtracking affect the language recognized by a parser?, and https://cs.stackexchange.com/questions/143480/dragon-book-4-4-5-exercise/143975.

CFGs have a lot of theory to back them up, but PEGs not so much. The set of languages that can be encoded by CFG and those that can be encoded by PEG partially overlap, but neither encompasses the other.

For a more thorough review of this I recommend the excellent essay Which Parsing Approach?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜