Testing intersection of two regular languages
I want to test whether two languages have a string in common. Both of these languages are from a subset of regular languages described below and I only need to know whether there exists a string in both languages, not produce an example string.
The language is specified by a glob-like string like
/foo/**/bar/*.baz
where **
matches 0 or more characters, and *
matches zero or more characters that are not /
, and all other characters are literal.
Any ideas?
thanks, mike
EDIT:
I implemented something that seems to perform well, but have yet to try a 开发者_StackOverflow中文版correctness proof. You can see the source and unit tests
Build FAs A
and B
for both languages, and construct the "intersection FA" AnB
. If AnB
has at least one accepting state accessible from the start state, then there is a word that is in both languages.
Constructing AnB
could be tricky, but I'm sure there are FA textbooks that cover it. The approach I would take is:
- The states of
AnB
is the cartesian product of the states ofA
andB
respectively. A state inAnB
is written(a, b)
wherea
is a state inA
andb
is a state inB
. - A transition
(a, b) ->r (c, d)
(meaning, there is a transition from(a, b)
to(c, d)
on symbolr
) exists iffa ->r c
is a transition inA
, andb ->r d
is a transition inB
. (a, b)
is a start state inAnB
iffa
andb
are start states inA
andB
respectively.(a, b)
is an accepting state inAnB
iff each is an accepting state in its respective FA.
This is all off the top of my head, and hence completely unproven!
I just did a quick search and this problem is decidable (aka can be done), but I don't know of any good algorithms to do it. One is solution is:
- Convert both regular expressions to NFAs A and B
- Create a NFA, C, that represents the intersection of A and B.
- Now try every string from 0 to the number of states in C and see if C accepts it (since if the string is longer it must repeat states at one point).
I know this might be a little hard to follow but this is only way I know how.
精彩评论