Maximum match length of a regular expression
What is the easiest way to determine the maximum match length of a regular expression?
Specifically, I am using Python's re
module.
E.g. fo开发者_StackOverflow社区r foo((bar){2,3}|potato)
it would be 12.
Obviously, regexes using operators like *
and +
have theoretically unbounded match lengths; in those cases returning an error or something is fine. Giving an error for regexes using the (?...)
extensions is also fine.
I would also be ok with getting an approximate upper bound, as long as it is always greater than the actual maximum length, but not too much greater.
Using pyparsing's invRegex module:
import invRegex
data='foo(bar{2,3}|potato)'
print(list(invRegex.invert(data)))
# ['foobarr', 'foobarrr', 'foopotato']
print(max(map(len,invRegex.invert(data))))
# 9
Another alternative is to use ipermute
from this module.
import inverse_regex
data='foo(bar{2,3}|potato)'
print(list(inverse_regex.ipermute(data)))
# ['foobarr', 'foobarrr', 'foopotato']
print(max(map(len,inverse_regex.ipermute(data))))
# 9
Solved, I think. Thanks to unutbu for pointing me to sre_parse
!
import sre_parse
def get_regex_max_match_len(regex):
minlen, maxlen = sre_parse.parse(regex).getwidth()
if maxlen >= sre_parse.MAXREPEAT: raise ValueError('unbounded regex')
return maxlen
Results in:
>>> get_regex_max_match_len('foo((bar){2,3}|potato)')
12
>>> get_regex_max_match_len('.*')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in get_regex_max_match_len
ValueError: unbounded regex
精彩评论