Algorithm for regular expression intersection with cfg
Im looki开发者_Python百科ng for an algorithm which outputs if the intersection of a regular expression and a contex free grammar is empty or not. I know that this problem is decidable, however, I cannot find any example implementation (in pseudocode).
Can someone provide me with such an algorithm, in .NET if possible but this is not a must. This problem is also called "regular intersection". Googling for it only gives me the geometrical algorithm or the theory about it.
edit:
Anybody. Im really stuck on it, and cannot find anything yet.
Here is a sketch of an approach that occurs to me. I think this should work but it is probably not the best way to do it since it uses the terribly messy conversion from PDA to CFG.
Convert the regular expression into a nondeterministic finite automaton (NFA) and reduce it down to the minimal determinsitic finite automaton (DFA). Convert the context free grammar (CFG) into a pushdown automoton (PDA). These first steps are all well known and fairly simple algorithms.
Take the intersection of the DFA and PDA, which is also a PDA. We will say the DFA has states S1, start state s1, final states F1, and transitions delta1 of the form ((source,trigger),destination), and the PDA has states S2, start state s2, final states F2, and transitons delta2 of the form ((source,trigger,pop),(destination,push)). The new PDA has states S1 X S2, each state labeled by a pair. It has final states F1 X F2, and start state (s1,s2). Now for the transitions.
For each transition d an element of delta2, for each state s an element s1, find the transition t an element of delta1 of the form ((s,d.trigger),?). Make a new transition (((d.source, s), d.trigger, d.pop),((d.destination, t.destination),d.push)).
This new PDA should accept the intersection of the languages produced by the RE and the CFG. To test if the language is empty you will need to convert it back to a CFG. The algorithm for that is messy and large, but it works. Once you have done that, mark each terminal symbol. Then mark each symbol which has a rule where there are only marked symbols on the right hand side, and repeat until you can mark no more symbols. If you can mark the start symbol, the language is not empty. Otherwise, the language is empty.
In fact, there is a simpler algorithm for computing the intersection between a context-free grammar and a regular expression. It does not use push-down automata, which can be costly to obtain from CFG with several conversions.
This solution was presented in:
Y. Ba-Hillel, M. Prles, and E. Shamir. 1965. On formal properties of simple phrase structure grammars. Z. Phonetik, Sprachwissen. Komm. 15 (I961), 143-172. Y. Bar-Hillel, Language and Information, Addison-Wesley, Reading, Mass (1965), 116–150.
but you can find a simpler version in:
Richard Beigel and William Gasarch. .. A Proof that the intersection of a context-free language and a regular language is Context-Free Which Does Not use pushdown automata. http://www.cs.umd.edu/~gasarch/BLOGPAPERS/ cfg.pdf (.).
If it help, this solution was implemented in Pyformlang (https://pyformlang.readthedocs.io/), and you can find it on Github for Python (https://github.com/Aunsiels/pyformlang/blob/master/pyformlang/cfg/cfg.py)
精彩评论