foreach and preg_match on a heavy amount of data not working properly
I have to files, one is full of keywords sequences (~20k lines), the other is full of regular expression (~2.5k).
I want to test each keyword with each regexp and print the one that matches. I tested my files and that makes around 22 750 000 tests. I am using the following code :
$count = 0;
$countM = 0;
foreach ($arrayRegexp as $r) {
foreach ($arrayKeywords as $k) {
$count++;
if (preg_match($r, $k, $match) {
$countM++;
echo $k.' matched with keywords '.$match[1].'<br/>';
}
}
}
echo "$count tests with $countM matches.";
Unfortunately, after computing for a while, only parts of the actual matches are displayed and the final line keeping the counts never displays. What is even more weird is that if I comment the preg section to keep only the two foreach and the count display, everything works fine.
I believe this is due to an excessive amount of data to be processed but I would like to know if there is 开发者_如何学Gorecommendations I didn't follow for that kind of operations. The regular expressions I use are very complicated and I cannot change to something else.
Ideas anyone?
There are two optimization options:
- Regular expressions can usually combined into alternatives
/(regex1|regex2|...)/
. Oftentimes PCRE can evaluate alternatives faster than PHP can execute a loop. - I'm not sure if this is faster at all (modifies the subjects), but you could use the keywords array as parameter to preg_replace_callback() directly, thus eliminating the second loop.
As example:
$rx = implode("|", $arrayRegexp); // if it hasn't /regexp/ enclosures
preg_replace_callback("#($rx)#", "print", $arrayKeywords);
But define a custom print function to output and count the results, and let it just return e.g. an empty string.
Come to think of it, preg_replace_callback would also take an array of regular expressions. Not sure if it cross-checks each regex on each string though.
Increase execution time
usethis line in .htaccess
php_value max_execution_time 80000
精彩评论