开发者

improving Perl regex performance by adding +

I have some regexes in a Perl script that are correct but slow. I am considering trying to improve performance by adding extra + operators (ie *+ instead of * and ++ instead of +) to disable backtracking. I tried replacing all of them and the regexes stopped working... so much for the开发者_如何转开发 simple solution. How do I know where I can add them where it won't break the regex?


If the regexes stopped working, you either aren't using a version of perl that supports them, or you actually do need backtracking in those cases.

Identify sections of the regex that won't ever need backtracking (that is, that if asked to match starting at a given point, there will never be more than one length you might want them to match), and surround them with (?> ). This has the same effect as ++/*+ and is supported even pre-5.10.

Note that restricting backtracking is often not "optimization", since it changes what will and will not be matched. The idea is that you use it to better describe what you actually want matched. Borrowing from the article linked in the OP's answer, something like ^(.*?,){11}P (twelfth comma separated field starts P) is not just inefficient, it is incorrect, since backtracking will cause it to actually match even when only a field after the twelfth starts with P. By correcting it to ^(?>.*?,){11}P you are restricting it to actually matching the correct number of leading fields. (In this trivial case, ^([^,]*,){11}P also does the job, but if you add in support for escaped or quoted commas within fields using alternation, (?> becomes the easier choice.)


Hmmm... once I posted the question, looking at the "Related" column led me to this which has some pretty good ideas.... http://www.regular-expressions.info/catastrophic.html

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜