开发者

Pathological regex that blows up (time & memory)?

What's a pathological regex that blows up many parsers (both in time & memory)? 开发者_如何学JAVAand which parsers? Bonus points the more basic and standard the regex is, and the more likely that a non-malicious user might innocently come up with it. Feel free to post actual time and memory data, and parser version.

(I seem to remember that excessive lookbehind assertions or (EDIT:)backtracking in PERL are said to do this, or at least used to be. Anything else?)


Adapted from the first example in the article Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...):

perl -e '$n=29; ("a" x $n) =~ (("a?" x $n).("a" x $n))'

Which takes 40+ seconds on my system. Then do $n++ for exponentially increasing fun...


From Russ Cox's excellent article: $ perl -e '("a" x 100000) =~ /^(ab?)*$/;'. This apparently causes a segfault. There are more in the article.


I always use this regex to match strings inside PHP or JavaScript source code in PHP:

~'(\\.|[^'])*'|"(\\.|[^"])*"~s

And it almost always fail on a very long string (about 50000 chars long will do).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜