Parsing CSS 2.1 with the correct CSS parsing conventions in ANTLR
The CSS2.1 grammar includes a strong advisory to not parse CSS directly this way, "since it does not express the parsing conventions, only the CSS 2.1 syntax."
Indeed, any parser that ignores these parsing conventions (as we have tried to do) runs into problems when dealing with pages containing errors or unknown constructs.
Therefore, we'd like our CSS2.1 ANTLR parser - which does not currently follow the forward-compatible and error-handling parsing conventions - to somehow use the parse tree generated by the basic grammar that does incorporate the parsing conventions. (The latter could perhaps be generated by another ANTLR parser.)
Is this a reasonable approach? Are there well understood techniques for doing this?
To reiterate, the goal is to produce a robust CSS2开发者_StackOverflow社区.1 parser that can handle errors and new constructs gracefully, in accordance with the CSS parsing conventions.
We went with the general approach above that we thought might work; it did.
Briefly, we have two ANTLR parsers: one for the core CSS grammar, and another for the CSS2.1 grammar. The CSS2.1 parser can be executed independently of the core CSS parser. However, that is not how it is actually used.
The core CSS parser is used to construct a basic parse tree. The rule actions re-parse the text using the appropriate entry points of the CSS 2.1 grammar, to produce the same C# objects that the CSS2.1 grammar would have produced when executed standalone. For example, the ruleset action in the core CSS parser re-parses the matched text using the ruleset entry point in the CSS 2.1 grammar, and adds the resulting objects to its result.
A couple of important points that took us a lot of time to figure out:
ANTLR Parser rules that are called from external code are different in the way that they handle EOF's, compared to entry points that are called by other rules.
The core CSS grammar needs to be augmented depending on which level of CSS is actually being translated to without violating the parsing conventions. One example is the @media at-rule, whose block contains rulesets that needed to be parsed as far as possible using the parsing conventions, before being handed over to the CSS2.1 parser.
Hope this is helpful to others looking to do the same thing.
精彩评论