PHP's preg-match_all causing Apache Segfault
I'm using two regular expressions to pull assignments out of MySQL queries and using them to create an audit trail. One of them is the 'picky' one that requires quoted column names/etc., the other one does not.
Both of them are tested and parse the values out correctly. The issue I'm having is that with certain queries the 'picky' regexp is actually just causing Apache to segfault.
I tried a variety of things to determine this was the cause up to leaving the regexp in the code, and just modifying the conditional to ensure it wasn't run (to rule out some sort of compile-time issue or something). No issues. It's only when it runs the regexp against specific queries that it segfaults, and I can't find any obvious pattern to tell me why.
The code in question:
if ($picky)
preg_match_all("/[`'\"]((?:[A-Z]|[a-z]|_|[0-9])+)[`'\"] *= *'((?:[^'\\\\]|\\\\.)*)'/", $sql, $matches);
else
preg_match_all("/[`'\"]?((?:[A-Z]|[a-z]|_|[0-9])+)[`'\"]? *= *[`'\"]?([^`'\" ,]+)[`'\"]?/", $sql, $matches);
The only difference between the two is that the first one removes the question marks on the quotes to make them non-optional and removes the option of using different kinds of quotes on the value - only allows single quotes. Replacing the first regexp with the second (for testing purposes) and using the same data removes the issue - it is definitely something to do with the regexp.
The specific SQL that is causing me grief is available at:
http://stackoverflow.pastebin.com/m75c2a2a0Interestingly enough, when I remove the highlighted section, it all works fine. Trying to submit the highlighted section by itself causes no error.
I'm pretty perplexed as to what's going on here. Can anyone offer any suggestions as to further debugging or a fix?
EDIT: Nothing terribly exciting, but for t开发者_StackOverflow社区he sake of completeness here's the relevant log entry from Apache (/var/log/apache2/error.log - There's nothing in the site's error.log. Not even a mention of the request in the access log.)
[Thu Dec 10 10:08:03 2009] [notice] child pid 20835 exit signal Segmentation fault (11)
One of these for each request containing that query.
EDIT2: On the suggestion of Kuroki Kaze, I tried gibberish of the same length and got the same segfault. Sat and tried a bunch of different lengths and found the limit. 6035 characters works fine. 6036 segfaults.
EDIT3: Changing the values of pcre.backtrack_limit
and pcre.recursion_limit
in php.ini
mitigated the problem somewhat. Apache no longer segfaults, but my regexp no longer matches all of the matches in the string. Apparently this is a long-known (from 2007) bug in PHP/PCRE:
EDIT4: I posted the code in the answers below that I used to replace this specific regular expression as the workarounds weren't acceptable for my purpose (product for sale, can't guarantee php.ini changes and the regexp only partially working removed functionality we require). Code I posted is released into the public domain with no warranty or support of any kind. I hope it can help someone else. :)
Thank you everyone for the help!
Adam
I have been hit with a similar preg_match-related issue, same Apache segfault. Only the preg_match that causes it is built-into the CMS I'm using (WordPress).
The "workaround" that was offered was to change these settings in php.ini:
[Pcre] ;PCRE library backtracking limit. ;pcre.backtrack_limit=100000 pcre.recursion_limit=200000000 pcre.backtrack_limit=100000000
The trade-off is for rendering larger pages, (in my case, > 200 rows; when one of the columns is limited to a 1500-character text description), you'll get pretty high CPU utilization, and I'm still seeing the segfaults. Just not as frequently.
My site's close to end-of-life, so I don't really have much need (or budget) to look for a real solution. But maybe this can mitigate the issue you're seeing.
Interestingly enough, when I remove the highlighted section, it all works fine. Trying to submit the highlighted section by itself causes no error.
What about size of the submission? If you pass gibberish of equal length, what will happen?
EDIT: splitting and merging will look something like this:
$strings = explode("\n", $sql);
$matches = array(array(), array(), array());
foreach ($strings AS $string) {
preg_match_all("/[`'\"]?((?:[A-Z]|[a-z]|_|[0-9])+)[`'\"]? *= *[`'\"]?([^`'\" ,]+)[`'\"]?/", $string, $matches_temp);
$matches[0] = array_merge($matches[0], $matches_temp[0]);
$matches[1] = array_merge($matches[1], $matches_temp[1]);
$matches[2] = array_merge($matches[2], $matches_temp[2]);
}
Given that this only needs to match against the queries when saving pages or performing other not very often-executed operations, I felt the performance hit of the following code was acceptable. It parses the SQL query ($sql
) and places name=>value pairs into $data
. Seems to be working well and handles large queries fine.
$quoted = '';
$escaped = false;
$key = '';
$value = '';
$target = 'key';
for ($i=0; $i<strlen($sql); $i++)
{
if ($escaped)
{
$$target .= $sql[$i];
$escaped = false;
}
else if ($quoted!='')
{
if ($sql[$i]=='\\')
$escaped = true;
else if ($sql[$i]==$quoted)
$quoted = '';
else
$$target .= $sql[$i];
}
else
{
if ($sql[$i]=='\'' || $sql[$i]=='`')
{
$quoted = $sql[$i];
$$target = '';
}
else if ($sql[$i]=='=')
$target = 'value';
else if ($sql[$i]==',')
{
$target = 'key';
$data[$key] = $value;
$key = '';
$value = '';
}
}
}
if ($value!='')
$data[$key] = $value;
Thank you everyone for the help and direction!
精彩评论