Deriving a state machine from a BNF grammar
I am trying to put together a proof of concept of an XSS-safe string interpolation scheme.
Given a string with substitutions,
"Hello <b>$planetoid</b>!"
I want break it into literal portions and substitutions ("Hello<b>" planetoid "</b>!")
and then run a state machine left to right over the literal portions. When I reach an interpolated value (planetoid
in the above), I need to be able to get from the state to an appropriate escaping function.
Does anyone know of any examples of how to use lex/yacc/bison to derive a state machine and be able to associate labels in the grammar with output states? I want to derive a state machine that I can use both in javascript, and to try and replace PHP's underlying string implementation.
My reasons for doing this are describe开发者_运维百科d here.
cheers, mike
In general, it is not possible to create a state machine for grammars that can be represented in BNF. State machines can only recognize regular languages and BNF can specify context-free languages. Yacc can create parsers. Would that be sufficient?
It looks like I can put markers in the grammar, so if I use two different production types, one with no side effect, and that consumes characters, and one that consumes no characters, but updates a state variable
ST_EXPECT_TAG_NAME : { state = TAG_NAME };
TAG_BODY
: '<' ST_EXPECT_TAG_NAME TAG_NAME ATTRS SPACES '>' ST_OUT_OF_TAG
;
the compiled output associates state names in a switch statement
YY_REDUCE_PRINT (yyn);
switch (yyn)
{
case 118:
#line 74 "tmp/html-combo.y"
{ state = TAG_NAME ;}
break;
There may be a way to extract the tables without parsing C, but I'm so ignorant of yacc/bison.
You can use yacc/bison for this. Initially looking at bison its hard to tell where you can implement a state machine. Rules in bison are resolved left-right-up. Ie; If you have a rule (called rule0) which derives rule1 rule2 rule3, the actions are called in this order: rule1,rule2,rule3,rule0. You can combine that using a global state machine, with dynamic return values for rules (I use a union with different types like a string, an int or even a container for return values).
精彩评论