开发者

Maximum match length of a regular expression

What is the easiest way to determine the maximum match length of a regular expression?

Specifically, I am using Python's re module.

E.g. fo开发者_StackOverflow社区r foo((bar){2,3}|potato) it would be 12.

Obviously, regexes using operators like * and + have theoretically unbounded match lengths; in those cases returning an error or something is fine. Giving an error for regexes using the (?...) extensions is also fine.

I would also be ok with getting an approximate upper bound, as long as it is always greater than the actual maximum length, but not too much greater.


Using pyparsing's invRegex module:

import invRegex
data='foo(bar{2,3}|potato)'    
print(list(invRegex.invert(data)))
# ['foobarr', 'foobarrr', 'foopotato']    
print(max(map(len,invRegex.invert(data))))
# 9

Another alternative is to use ipermute from this module.

import inverse_regex
data='foo(bar{2,3}|potato)'
print(list(inverse_regex.ipermute(data)))
# ['foobarr', 'foobarrr', 'foopotato']
print(max(map(len,inverse_regex.ipermute(data))))
# 9


Solved, I think. Thanks to unutbu for pointing me to sre_parse!

import sre_parse

def get_regex_max_match_len(regex):
    minlen, maxlen = sre_parse.parse(regex).getwidth()
    if maxlen >= sre_parse.MAXREPEAT: raise ValueError('unbounded regex')
    return maxlen

Results in:

>>> get_regex_max_match_len('foo((bar){2,3}|potato)')
12
>>> get_regex_max_match_len('.*')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in get_regex_max_match_len
ValueError: unbounded regex
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜