Ambiguous grammar with Lemon Parser Generator
So basically I want to parsed structure CSS code in PHP, using a lexer/parser generated by the PEAR packages PHP_LexerGenerator and PHP_ParserGenerator. My goal is to parse files like this:
selector, selector2 {
prop: value;
prop2 /*comment */ :
value;
subselector {
prop: value;
subsub { prop: value; }
}
}
This is all fine as long as I don't have pseudo classes. Pseudoclasses allow it, to add :
and a CSS name ([a-z][a-z0-9]*
) to an element, like in a.menu:visited
. Being somewhat lazy, the parser has no list of valid pseudo classes and accepts everything for the class name.
My grammar (ignoring all the special cases and whitespace) looks like this:
document ::= (<rule>)*
rule ::= <selector> '{' (<content>)* '}'
content ::= <rule>
content ::= <definition>
definition ::= <name> ':' <name> ';'
// h1 .class.class2#id 开发者_StackOverflow中文版 :visited
<selector> ::= <name> (('.'|'#') <name>)* (':' <name>)?
Now, when I try to parse the following
h1 {
test:visited {
simple: case;
}
}
The parser complains, that it expected a <name>
to follow the double colon. So it tries to read the simple:
as a <selector>
(just look at the syntax highlighting of SO).
Is it my error that the parser can not backtrace enough to try the <definition>
rule? Or is Lemon just not powerful enough to express this? If so, what can I do to get a parser working with this grammar?
Your question asks about PHP_ParserGenerator and PHP_LexerGenerator. The parser generator code is marked as 'not maintained', which bodes ill.
The syntax you are using for the grammar is not acceptable for Lemon, so you need to clarify why you think the parser generator should accept it. You mention a problem with 'expected a <name>
to follow the double colon, but neither your grammar nor your sample input has a double colon, which makes it hard to help you.
I think this Lemon grammar is equivalent to the one you showed:
document ::= rule_list.
rule_list ::= .
rule_list ::= rule_list rule.
rule ::= selector LBRACE content_list RBRACE.
content_list ::= .
content_list ::= content_list content.
content ::= rule.
content ::= definition.
definition ::= NAME COLON NAME SEMICOLON.
selector ::= NAME opt_dothashlist opt_colonname.
opt_dothashlist ::= .
opt_dothashlist ::= dot_or_hash NAME.
dot_or_hash ::= DOT.
dot_or_hash ::= HASH.
opt_colonname ::= COLON NAME.
However, when it is compiled, Lemon complains 1 parsing conflicts
and the output file shows:
State 2:
definition ::= NAME * COLON NAME SEMICOLON
selector ::= NAME * opt_dothashlist opt_colonname
(10) opt_dothashlist ::= *
opt_dothashlist ::= * dot_or_hash NAME
dot_or_hash ::= * DOT
dot_or_hash ::= * HASH
COLON shift 10
COLON reduce 10 ** Parsing conflict **
DOT shift 13
HASH shift 12
opt_dothashlist shift 5
dot_or_hash shift 7
This means it is not sure what to do with a colon; it might be the 'opt_colonname' part of a 'selector' or it might be part of a 'definition':
name1:name4 : name2:name3 ;
Did you mean to allow syntax such as that? Nominally, according to the grammar, that should be valid, but
name1:name4;
should also be valid. I think it requires 2 or 3 lookahead tokens to disambiguate these (so your grammar is not LALR(1) but LALR(3)).
Review your definition of 'selector' in particular.
精彩评论