create automata while parsing the regular expression
I am trying to conve开发者_开发知识库rt a regular expression into NFA and I am having trouble in it. If you are unaware of the subject then this is a link to what I am saying here.
The problem here is that the author explains that given a string of characters you first convert it into postfix. He mentions that in real time it would be better to draw the NFA while parsing the R.E but has given no such method to do so.....
I am having problems on starting this up. Can anyone please guide me on what should be the algorithm to create NFA while parsing the string because the parenthesis are a big problem as they are supposed to be done first......
PS:- I am actually not sure what other tags should be placed in this....Also this is NOT homework
Here's a combined parser, NFA builder, and NFA interpreter in one page of Python. I hope I'm not spoiling the fun of figuring it out for yourself -- you might prefer to wait and keep hacking before following the link.
It's sort of like deinst's suggestion, but 'backwards'. As deinst says, you can have the parser build an NFA for each subexpression of the regular expression and then hook them up as you go. For instance, for (a|b)*c
you'd first parse (a|b)*
to get NFA #1, then parse c
to get NFA #2, then clobber NFA #1's final state, changing it to #2's start state. And so on recursively. This is the conventional answer.
My code instead first creates a trivial NFA with just an accepting state and nothing more. Then it parses c
, extending the NFA: now we have an NFA that checks for c
and then accepts. Next, it recursively parses (a|b)*
, still extending the NFA. The contract of the parser, given a string re
and an NFA k
, is to parse the string to produce a result NFA that ends up at the start of k
when re
matches. This approach avoids the need to clobber bits of the partial NFAs to hook them together.
One can consider the regular expression in the parentheses as a separate NFA. All you care about is that it has an input state and an accept state. You just recursively parse the stuff in parentheses into a NFA and plug its input and accept states into the the appropriate places in the NFA that you are constructing. The tricky part of parsing the infix expression is getting the operator precedences correct, which will take as much work as converting to postfix.
I suspect that what he means is to instead of outputting the postfix (from, say the shunting yard algorithm) and then reparsing the postfix, just process the postfix tokens as you are ready to output them (instead of outputting them).
精彩评论