Why does Mono locks up on regex
This is the line mono on linux locks up (i am using 2.6.4 VM distro on the official site)
var match = Regex.Match(sz, linkPattern);
The string is this which gets the link and the title.
var linkPattern = @"<\ba\b[^\>]*\bhre开发者_开发技巧f\b*=\b*""([^""\>]*)""[^\>]*\btitle\b*=\b*""([^""\>]*) by [^""\>]*""";
When mono hits that line it doesnt crash, throw an exception or anything. Using tops i see mono using 96% of the CPU. I dont know how long the string is. I suspect its <8kb (i tested a different url) and it has been a few minutes since i ran the code so something must be broken.
"Too many \b
's" was my first reaction. But really:
\b
means word boundary. In my opinion, <\ba
and <a
should be identical. Also, \b*
therefore would mean "optional repetition of word boundaries", which sounds rather confusing.
I guess I've never used \b
at all, and used \s?
or \s*
instead.
Did you try a different regex engine (Perl, PHP) to determine whether the lockup is due to Mono?
There are some bugs in Mono's regex implementation that can cause it to recurse infinitely. Probably the only fix is to rewrite your pattern to be a simpler regular expression, or not use regular expressions for this task.
You may also want to file a bug. I think there is a Google Summer of Code student currently working on Mono's regular expression engine.
精彩评论