开发者

Regex take too long to match the result

I have this regex pattern

<(\d+)>(\d+\.\d+|\d{4}\-\d+\-\d+\s+\d{2}:\d{2}:\d{2})(?:\..*?)*\s+(ALER|NOTI)

and this is my input (will not matched at all)

<150>2010-12-29 18:11:30.883 -0700 192.168.2.145 80 192.168.2.87 2795 "-" "-" GET HTTP 192.168.2.145 HTTP/1.1 200 36200 0 1038 544 192.168.2.221 80 540  SERVER DEFAULT PASSIVE VALID /joomla/ "-" http://192.168.2.145/joomla/index.php?option=com_content&view=a be4d44e8f3986183a87991398c1c212e=1;      be4d44e8f3986183a87991398c1c212e=1

This will return not matched result but it takes too long to output the result. Since i have a thousand of logs/inputs in a second, it should finish very fast for every single log/input. Sometime it reaches CPU 100%.

Can anyone help me to solve this regex pro开发者_如何转开发blem?

Thanks


You have catastrophic backtracking due to the large number of ways the expression (?:\..*?)* can match. Potentially millions of matches must be checked, increasing exponentially with the number of dots in your string. To fix it you can change this:

(?:\..*?)*\s+

to this:

\..*\s


It looks like you are looking for some date/time/etc. information about the ALER/NOTI lines. Can't you only parse those lines by grepping the ALER/NOTI first? Then it would probably be a lot easier to run the regex on those interesting lines (and it would probably simplify the regex).


Since you didn't provide a working example, the only thing to go on as to why its slow
is this (?:\..*?)* which is bizzare. Meta period . matches anything including literal
period. That expression says if there is a literal period, get it and all up to the \s.
But, the literal period is optional.

(?:\.(?:(?!\s(?:ALER|NOTI)).)*?)?\s+(ALER|NOTI)

Which itself is rather bizzare. It can be viewed if expanded.

(?:
    \.
    (?:
        (?!\s(?:ALER|NOTI)).
    )*?
)?
\s+
(ALER|NOTI)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜