Python's Regular Expression Source String Length
In Python Regular Expressions,
re.compile("x"*50000)
gives me OverflowError: regular expression code size limit exceeded
but following one does not get any error, but it hits 100% CPU, and took 1 minute in my PC
>>> re.compile(".*?.*?.*?.*?.*?.*?.*?.*?.*?.*?"*50000)
<_sre.SRE_Pattern object at 0x03FB0020>
Is that normal?
Should I assume, ".*?.*?.*?.*?.*?.*?.*?.*?.*?.*?"*50000
is shorter than "x"*50000
?
Tested on Python 2.6, Win32
UPDATE 1:
It Looks like ".*?.*?.*?.*?.*?.*?.*?.*?.*?.*?"*50000
could be reduce to .*?
So, how about this one?
re.compile(".*?x"*50000)
It does compile, and if that one also can reduce to ".*?x"
, it should match to string "abcx"
or "x"
alone, but it does not match.
So, Am I missing something?
UPDATE 2:
My Point is not to know max limit of regex source strings, I like to know some reasons/concepts of "x"*50000
caught by overflow handler, but not on ".*?x"*50000
.
It does not make sense for me, thats why.
It is something missing on overflow checking or Its just fine or its really overflowing something?
Any Hints/Opinions will be appreciated.
The difference is that ".*?.*?.*?.*?.*?.*?.*?.*?.*?.*?"*50000
can be reduced to ".*?"
, while "x"*50000
has to generate 50000 nodes in the FSM (or a similar structure used by the regex engine).
EDIT: Ok, I was wrong. It's not that smart. The reason why "x"*50000
fails, but ".*?x"*50000
doesn't is that there is a limit on size of one "code item". "x"*50000
will generate one long item and ".*?x"*50000
will generate many small items. If you could split the string literal somehow without changing the meaning of the regex, it would work, but I can't think of a way to do that.
you want to match 50000 "x"s , correct??? if so, an alternative without regex
if "x"*50000 in mystring:
print "found"
if you want to match 50000 "x"s using regex, you can use range
>>> pat=re.compile("x{50000}")
>>> pat.search(s)
<_sre.SRE_Match object at 0xb8057a30>
on my system it will take in length of 65535 max
>>> pat=re.compile("x{65536}")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.6/re.py", line 188, in compile
return _compile(pattern, flags)
File "/usr/lib/python2.6/re.py", line 241, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python2.6/sre_compile.py", line 529, in compile
groupindex, indexgroup
RuntimeError: invalid SRE code
>>> pat=re.compile("x{65535}")
>>>
I don't know if there are tweaks in Python we can use to increase that limit though.
精彩评论