开发者

PHP Loop - expression/function causing serious delay

I was wondering if anybody could shed any light on this problem.. PHP 5.3.0 :)

I have a loop, which is grabbing the contents of a CSV file (large, 200mb), handling the data, building a stack of variables for mysql inserts and once the loop is complete and the variables created, I'm inserting the information.

Now firstly, the mysql insert is performing perfectly, no delays and all is fine, however it's the LOOP itself that has the delay, I was originally using fgetcsv() to read the CSV file but compared to file_get_contents() this had a seriously delay - so I switched to file_get_contents(). The loop will perform in a matter of seconds, until I attempt to add a function (I've also added the expression inside the loop without the function to see 开发者_开发技巧if it helps) to create an array with the CSV data from each line, this is what is causing serious delays on the parsing time! (the difference is about 30 seconds based on this 200mb file but depending of filesize of csv file I guess)

Here's some code so you can see what I'm doing:

$filename = "file.csv";
$content = file_get_contents($filename);    
$rows = explode("\n", $content);    
foreach ($rows as $data) {    
    $data = preg_replace("/^\"(.*)\"$/","$1",preg_split("/,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/", trim($data))); //THIS IS THE CULPRIT CAUSING SLOW LOADING?!?
}

Running the above loop, will perform almost instantly without the line:

$data = preg_replace("/^\"(.*)\"$/","$1",preg_split("/,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/", trim($data)));

I've also tried creating a function as below (outside of loop):

function csv_string_to_array($str) {
$expr="/,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/";
$results=preg_split($expr,trim($str));
return preg_replace("/^\"(.*)\"$/","$1",$results);
}

and calling the function instead of the one liner:

$data = csv_string_to_array($data);

With again no luck :(

Any help would be appreciated on this, I'm guessing the fgetcsv function is performing in a very similar way based on the delay it causes, looping through and creating an array from the line of data.

Danny


The regex subexpressions (bounded by "(...)") are the issue. It's trivial to show that adding these to an expression can greatly reduce its performance. The first thing I would try is to stop using preg_replace() to simply remove leading and trailing double quotes (trim() would be a better bet for that) and see how much that helps. After that you might need to try a non-regex way to parse the line.


I partially found a solution, I'm sending a batch to only loop 1000 lines at a time (php is looping by 1000 until it reaches the end of the file).

I'm then only setting:

$data = preg_replace("/^\"(.*)\"$/","$1",preg_split("/,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/", trim($data)));

on the 1000 lines, so that it's not being set for the WHOLE file which was causing issues.

It is now looping and inserting 1000 rows into the mysql database in 1-2 seconds, which I'm happy with. I've setup the script to loop 1000 rows, remember its last location, then loop to the next 1000 until it reaches the end, it seems to be working ok!


I'd say the major culprit is the complexity of the preg_split() regexp. And the explode() is probably eating some seconds.

$content = file_get_contents($filename);    
$rows = explode("\n", $content); 

could be replaced by:

$rows = file ($filename); // returns an array

But, I second the above suggestion from ITroubs, fgetcsv() would probably be a much better solution.


I would suggest using fgetcsv for parsing the data. It seems like memory may be your biggest impact. So to avoid consuming 200MB of RAM, you should parse line-by-line as follows:

$fp = fopen($input, 'r');

while (($row = fgetcsv($fp, 0, ',', '"')) !== false) {
    $out = '"' . implode($row, '", "') . '"';  // quoted, comma-delimited output
    // perform work
}

Alternatively: Using conditionals in preg is typically very expensive. It is can sometimes be faster to process these lines using explode() and trim() with its $charlist parameter.

The other alternative, if you still want to use preg, add the S modifier to try to speed up the expression.

http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php
S
When a pattern is going to be used several times, it is worth spending more time analyzing it in order to speed up the time taken for matching. If this modifier is set, then this extra analysis is performed. At present, studying a pattern is useful only for non-anchored patterns that do not have a single fixed starting character.


By the way, I don't think your function is doing what you think it should: it won't actually modify the $rows array when you've exited from the loop. To do that, you need something more like:

foreach ($rows as $key => $data) {
    $rows[$key]=preg_replace("/^\"(.*)\"$/","$1",preg_split("/,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/", trim($data)));
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜