开发者

Testing intersection of two regular languages

I want to test whether two languages have a string in common. Both of these languages are from a subset of regular languages described below and I only need to know whether there exists a string in both languages, not produce an example string.

The language is specified by a glob-like string like

/foo/**/bar/*.baz

where ** matches 0 or more characters, and * matches zero or more characters that are not /, and all other characters are literal.

Any ideas?

thanks, mike

EDIT:

I implemented something that seems to perform well, but have yet to try a 开发者_StackOverflow中文版correctness proof. You can see the source and unit tests


Build FAs A and B for both languages, and construct the "intersection FA" AnB. If AnB has at least one accepting state accessible from the start state, then there is a word that is in both languages.

Constructing AnB could be tricky, but I'm sure there are FA textbooks that cover it. The approach I would take is:

  • The states of AnB is the cartesian product of the states of A and B respectively. A state in AnB is written (a, b) where a is a state in A and b is a state in B.
  • A transition (a, b) ->r (c, d) (meaning, there is a transition from (a, b) to (c, d) on symbol r) exists iff a ->r c is a transition in A, and b ->r d is a transition in B.
  • (a, b) is a start state in AnB iff a and b are start states in A and B respectively.
  • (a, b) is an accepting state in AnB iff each is an accepting state in its respective FA.

This is all off the top of my head, and hence completely unproven!


I just did a quick search and this problem is decidable (aka can be done), but I don't know of any good algorithms to do it. One is solution is:

  1. Convert both regular expressions to NFAs A and B
  2. Create a NFA, C, that represents the intersection of A and B.
  3. Now try every string from 0 to the number of states in C and see if C accepts it (since if the string is longer it must repeat states at one point).

I know this might be a little hard to follow but this is only way I know how.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜