开发者

foreach and preg_match on a heavy amount of data not working properly

I have to files, one is full of keywords sequences (~20k lines), the other is full of regular expression (~2.5k).

I want to test each keyword with each regexp and print the one that matches. I tested my files and that makes around 22 750 000 tests. I am using the following code :

$count = 0;
$countM = 0;
foreach ($arrayRegexp as $r) {
    foreach ($arrayKeywords as $k) {
        $count++;
        if (preg_match($r, $k, $match) {
            $countM++;
            echo $k.' matched with keywords '.$match[1].'<br/>';
        }
    }
}
echo "$count tests with $countM matches.";

Unfortunately, after computing for a while, only parts of the actual matches are displayed and the final line keeping the counts never displays. What is even more weird is that if I comment the preg section to keep only the two foreach and the count display, everything works fine.

I believe this is due to an excessive amount of data to be processed but I would like to know if there is 开发者_如何学Gorecommendations I didn't follow for that kind of operations. The regular expressions I use are very complicated and I cannot change to something else.

Ideas anyone?


There are two optimization options:

  • Regular expressions can usually combined into alternatives /(regex1|regex2|...)/. Oftentimes PCRE can evaluate alternatives faster than PHP can execute a loop.
  • I'm not sure if this is faster at all (modifies the subjects), but you could use the keywords array as parameter to preg_replace_callback() directly, thus eliminating the second loop.

As example:

 $rx = implode("|", $arrayRegexp);  // if it hasn't /regexp/ enclosures

 preg_replace_callback("#($rx)#", "print", $arrayKeywords);

But define a custom print function to output and count the results, and let it just return e.g. an empty string.

Come to think of it, preg_replace_callback would also take an array of regular expressions. Not sure if it cross-checks each regex on each string though.


Increase execution time

usethis line in .htaccess

php_value max_execution_time 80000
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜