Mutually exclusive regular expressions

2023-01-02 17:43 问答作者：

If I have a list of regular expressions, is there an easy way to determine that no two of them will both return a match for the same string?

That is, the list is valid if and only if for all strings a maximum of one item in the list will match the entire string.

开发者_StackOverflow中文版

It seems like this will be very hard (maybe impossible?) to prove definitively, but I can't seem to find any work on the subject.

The reason I ask is that I am working on a tokenizer that accepts regexes, and I would like to ensure only one token at a time can match the head of the input.

If you're working with pure regular expressions (no backreferences or other features that cause them to recognize context-free or more complicated languages), what you ask is possible. What you can do is convert each regex to a DFA, then (since regular languages are closed under intersection) combine them into a DFA that recognizes the intersection of the two languages. If that DFA has a path from the start state to an accepting state, that string is accepted by both input regexen.

The problem with this is that the first step of the usual regex->DFA algorithm is to convert the regex to a NFA, then convert the NFA to a DFA. But that last step can result in an exponential blowup in the number of DFA states, so this will only be feasible for very simple regular expressions.

If you are working with extended regex syntax, all bets are off: context-free languages are not closed under intersection, so this method won't work.

The Wkipedia article on regular expressions does state

It is possible to write an algorithm which for two given regular expressions decides whether the described languages are essentially equal, reduces each expression to a minimal deterministic finite state machine, and determines whether they are isomorphic (equivalent).

but gives no further hints.

Of course the easy way you are after is to run a lot of tests -- but we all know the shortcomings of testing as a method of proof.

You can't do that by only looking at the regular expression.

Consider the case where you have [0-9] and [0-9]+. They are obviously different expressions, but when applied to the string "1", they both produce the same result. When applied to string "11" they produce different results.

The point is that a regular expression isn't enough information. The result depends both on the regex and the target string.

继续阅读：mutual-exclusion regex

Mutually exclusive regular expressions

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？