开发者

Context-sensitivity vs Ambiguity

I'm confused about how context-sensitivity and ambiguity influence each other.

What i think is correct is:

Ambiguity:

An ambiguous grammar leads to the construction of more than one parse-tree using either left or right derivation. A language where all possible grammars are ambiguous is an ambiguous language.

For instance C++ is an ambiguous language because x * y always can mean two different things as discussed in: Why can't C++ be parsed with a LR(1) parser?.

Context-sensitivity:

A context-sensitive grammar has rules where the left hand side of these rules may contain (non)terminal symbols additional to the one nonterminal required within the lhs of all rules of different kinds of grammars. Which means that you cannot just replace a nonterminal while descending. Instead you have to look at the surrounding nonterminals first.


Now what bothers me are statements that more or less say that context-sensitive parsers can parse ambiguities like x * y. For instance in the above linked question it is stated that "... a parser that does [decorating a syntax tree while creating it] isn't context free, and LR parsers (the pure ones) are context free." In my opinion this implies that context-sensitive parsers (the opposite of context-free parsers?) can do this. Another example would be Is any part of C++ syntax context sensitive? where this question is answered with "Yes ...". Same here: what is an ambiguous context free grammar?

I don't see what this C++ ambiguity has to do with context-sensitivity. I don't think that there is any context-sensitive grammar that can deal with this ambiguity. For instance if you take a fictional rule like Typedef, <other>*, PointerDeclaration -> Ident "*" Ident

then you still would not be able to determine (with pure parsing) whether the concrete first Ident (e.g. "x") was used during the Typedef (e.g. typedef double x;).


So is it possible that the term "context-sensitivity" is used within the linked questions although meaning something simple like context-dependency (e.g. more information needed than provided by a simple parser). Or is there any kind of connection between "real" context-sensitivity" and ambiguities.

Edit More specified question: Is there any kind of ambiguities within context-free grammars that can be dealt with by using context-sensitive grammars. This question occurs to me because in the linked questions it sounds like the C++ ambiguity is sometimes referred to as an context-sensitivity problem.

Edit2 Additional Info: The book Computer Systems states on page 346 that requirements such as having the same amount of actual and formal parameters can be expressed by context-sensitive grammars. But this is very cumbersome because you would need lots of complexe rules. But maybe this could also apply to the C++ ambiguity mentioned earlier. So that we have rules like

"Typedef double x", <other>*, PointerDeclaration -> "x" "*" Ident

Of course such rules would be very limited and you would need a huge amount to express every possibility. At least t开发者_Python百科his could be an approach to the answer of the question if (theoratically) context-free free ambiguities can be replaced with the usage of context-sensitive rules


Context-sensitivity and ambiguity are entirely orthogonal. There exist ambiguous context-free languages and unambiguous context-sensitive languages.

A context-sensitive language is a formal language that can be parsed by a context-sensitive grammar (CSG). Every context-free language is also a context-sensitive language since context-free grammars are simplified context-sensitive languages. Not every formal language is context-sensitive though; there are languages that even a CSG cannot describe.


If you want to parse a context-sensitive language with a context-free parser, you define an a context-free grammar that accepts a superset of a context-sensitive language (because they are less powerful). Because you accept a superset, you might get ambiguities or false-positive results, which must be resolved after the parsing.

Example One: a XML-like language that allows for any tag name. Because context-free grammar cannot parse a sentence ww that consists of two repetitive words w = {a,b}+, it cannot parse <ID></ID> where IDs are equal as well. Thus one defines a context-free grammar that accepts a superset:

start -> elem
elem  -> open elem* close
open  -> '<' ID '>'
close -> '</' ID '>'
ID    -> ('a' / 'b')+

This obviously parses even the sentences that one doesn't want, therefore an extra check for equivalent IDs in open and close has to be done.

Example Two: C-like Typedef in a very simple language. Imagine a language that contains only typedef, pointers and multiplications. It has only two IDs, a and b. An example of such a language:

typedef a;
b * a;                    // multiplication
a * b;                    // b is pointer to type a 

The context-free grammar would be like:

start                     -> typedef multiplication-or-pointer+
typedef                   -> 'typedef' ID ';'
multiplication-or-pointer -> ID '*' ID ';'
ID                        -> 'a'
ID                        -> 'b'

The grammar does not accept superset, but it does not know if it sees multiplication or pointer, thus it is ambiguous. And therefore one has to go through the result and decide, if it is multiplication or pointer, depending what type is defined in typedef.

With context-sensitive grammar one can do much more. Very roughly (and imprecisely):

start                                     -> typedef multiplication-or-pointer+
typedef                                   -> 'typedef' ID ';'
multiplication-or-pointer                 -> pointer / multiplication
'typedef' 'a' ';' WHATEVER pointer        -> 'a' '*' ID   
'typedef' 'b' ';' WHATEVER pointer        -> 'b' '*' ID   
'typedef' 'b' ';' WHATEVER multiplication -> 'a' '*' ID
'typedef' 'a' ';' WHATEVER multiplication -> 'b' '*' ID
ID                                        -> 'a'
ID                                        -> 'b'

Please note, that what I show here is not precise, because I limited number of IDs. In general, there can be an infinite number of IDs. You can write a context-sensitive grammar for a general case (even though it must be absolutely unintuitive), but you can't write a context free grammar.

Regarding your Edit 1: I hope the previous example answers that.

Regarding your Edit 2: There are another tricks how to express that so the rules are not so limited, but they are usually mind-blowing and IMO it is the reason why nobody uses CSG formalism.

NB: context-sensitive grammar is equivalent to a linear bounded automaton, context-free grammar is equivalent to a pushdown automaton. It is not right to say that context-free parser is an opposite of context-sensitive parser.


Compilers do not use "pure" (whatever that may mean) grammars to do their parsing - they are real-world programs that do what all real-world programs do - apply heuristics in certain situations. This is why C++ compilers (and compilers for most other languages, except for undergrad exercises) are not produced using compiler generators.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜