Does an algorithm exist which can determine whether one regular language matches any input another regular language matches?
Let's say we have regular expressions:
- Hello W.*rld
- Hello World
- .* World
- .* W.*
I would like to minimize the number of regexes required to match arbitrary input.
To do that, I need to find if one regular expression matches any input matched by another expression. Is that possibl开发者_如何学编程e?
Billy3
Any regular expression can be linked to a DFA - you can minimize the DFA and since the minimal form is unique, you can decide whether two expressions are equivalent. Dani Cricco pointed out the Hopcroft O(n log n) algorithm. There is another improved algorithm by Hopcroft and Craft which tests the equivalence of two DFAs in O(n).
For a good survey on the matter and an interesting approach to this, I reccomend the paper Testing the Equivalence of Regular Languages, from arXiv.
Later edit: if you are interested in inclusion rather than equivalence for regular expressions, I have come across a paper that might be of interest: Inclusion Problem for Regular Expressions - I have only skimmed through it but it seems to contain a polynomial time algorithm to the problem.
Yes.
The problem of equivalence of two regular languages is decidable.
Sketch of an algorithm:
- minimize both DFAs
- check if they are isomorph
Sure!. A regular expression can be represented as an FSM (Finite State Machine) and there are technically infinite number of FSM that can recognize the same string.
Isomorphism is the name that describes if two FSM are equivalent. There are a couple of algorigthm to minimize an FSM. For example the Hopcroft minimization algorithm can minimize two FSM in O(n log n), on an n state automaton.
This problem is called "inclusion" or "subsumption" of regular expressions, because what you are asking for, is whether the set of words matched by one regexp includes (or subsumes) the set of words matched by the other regex. Equality is a different question which usually means whether two regexps matches exactly the same words, i.e. that they are functionally equivalent. For example "a*" includes "aa*", while they are not equal.
All known algorithms for regexp inclusion are the worst case take time exponential in the size of the regexp. But the standard algorithm is like this:
Input r1 and r2 Output Yes if r1 includes r2
- Create DFA(r1) and DFA(r2)
- Create Neg(DFA(r1)) (which matches exactly those words r1 dont match)
- Create Neg(DFA(r1))x DFA(r2) (which matches exactly those words matched by Neg(DFA(r1)) and DFA(r2))
- Check that the automaton made in 3. does not match any word
This works, since what you are checking is that there are no words matched by r2 that are not matched by r1.
精彩评论