file_get_contents => PHP Fatal error: Allowed memory exhausted

2023-02-15 16:44 问答作者：

I have no experience when dealing with larg开发者_运维问答e files so I am not sure what to do about this. I have attempted to read several large files using file_get_contents ; the task is to clean and munge them using preg_replace().

My code runs fine on small files ; however, the large files (40 MB) trigger an Memory exhausted error:

PHP Fatal error:  Allowed memory size of 16777216 bytes exhausted (tried to allocate 41390283 bytes)

I was thinking of using fread() instead but I am not sure that'll work either. Is there a workaround for this problem?

Thanks for your input.

This is my code:

<?php
error_reporting(E_ALL);

##get find() results and remove DOS carriage returns.
##The error is thrown on the next line for large files!
$myData = file_get_contents("tmp11");
$newData = str_replace("^M", "", $myData);

##cleanup Model-Manufacturer field.
$pattern = '/(Model-Manufacturer:)(\n)(\w+)/i';
$replacement = '$1$3';
$newData = preg_replace($pattern, $replacement, $newData);

##cleanup Test_Version field and create comma delimited layout.
$pattern = '/(Test_Version=)(\d).(\d).(\d)(\n+)/';
$replacement = '$1$2.$3.$4      ';
$newData = preg_replace($pattern, $replacement, $newData);

##cleanup occasional empty Model-Manufacturer field.
$pattern = '/(Test_Version=)(\d).(\d).(\d)      (Test_Version=)/';
$replacement = '$1$2.$3.$4      Model-Manufacturer:N/A--$5';
$newData = preg_replace($pattern, $replacement, $newData);

##fix occasional Model-Manufacturer being incorrectly wrapped.
$newData = str_replace("--","\n",$newData);

##fix 'Binary file' message when find() utility cannot id file.
$pattern = '/(Binary file).*/';
$replacement = '';
$newData = preg_replace($pattern, $replacement, $newData);
$newData = removeEmptyLines($newData);

##replace colon with equal sign
$newData = str_replace("Model-Manufacturer:","Model-Manufacturer=",$newData);

##file stuff
$fh2 = fopen("tmp2","w");
fwrite($fh2, $newData);
fclose($fh2);

### Functions.

##Data cleanup
function removeEmptyLines($string)
{
        return preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $string);
}
?>

Firstly you should understand that when using file_get_contents you're fetching the entire string of data into a variable, that variable is stored in the hosts memory.

If that string is greater than the size dedicated to the PHP process then PHP will halt and display the error message above.

The way around this to open the file as a pointer, and then take a chunk at a time. This way if you had a 500MB file you can read the first 1MB of data, do what you will with it, delete that 1MB from the system's memory and replace it with the next MB. This allows you to manage how much data you're putting in the memory.

An example if this can be seen below, I will create a function that acts like node.js

function file_get_contents_chunked($file,$chunk_size,$callback)
{
    try
    {
        $handle = fopen($file, "r");
        $i = 0;
        while (!feof($handle))
        {
            call_user_func_array($callback,array(fread($handle,$chunk_size),&$handle,$i));
            $i++;
        }

        fclose($handle);

    }
    catch(Exception $e)
    {
         trigger_error("file_get_contents_chunked::" . $e->getMessage(),E_USER_NOTICE);
         return false;
    }

    return true;
}

and then use like so:

$success = file_get_contents_chunked("my/large/file",4096,function($chunk,&$handle,$iteration){
    /*
        * Do what you will with the {$chunk} here
        * {$handle} is passed in case you want to seek
        ** to different parts of the file
        * {$iteration} is the section of the file that has been read so
        * ($i * 4096) is your current offset within the file.
    */
    
});

if(!$success)
{
    //It Failed
}

One of the problems you will find is that you're trying to perform regex several times on an extremely large chunk of data. Not only that but your regex is built for matching the entire file.

With the above method your regex could become useless as you may only be matching a half set of data. What you should do is revert to the native string functions such as

strpos
substr
trim
explode

for matching the strings, I have added support in the callback so that the handle and current iteration are passed. This will allow you to work with the file directly within your callback, allowing you to use functions like fseek, ftruncate and fwrite for instance.

The way you're building your string manipulation is not efficient whatsoever, and using the proposed method above is by far a much better way.

A pretty ugly solution to adjust your memory limit depending on file size:

$filename = "yourfile.txt";
ini_set ('memory_limit', filesize ($filename) + 4000000);
$contents = file_get_contents ($filename);

The right solutuion would be to think if you can process the file in smaller chunks, or use command line tools from PHP.

If your file is line-based you can also use fgets to process it line-by-line.

For processing just n numbers of rows at a time, we can use generators in PHP.

n(use 1000)

This is how it works Read n lines, process them, come back at n+1, then read n lines, process them come back and read next n lines and so on.

Here's the code for doing so.

<?php
class readLargeCSV{

    public function __construct($filename, $delimiter = "\t"){
        $this->file = fopen($filename, 'r');
        $this->delimiter = $delimiter;
        $this->iterator = 0;
        $this->header = null;
    }

    public function csvToArray()
    {
        $data = array();
        while (($row = fgetcsv($this->file, 1000, $this->delimiter)) !== false)
        {
            $is_mul_1000 = false;
            if(!$this->header){
                $this->header = $row;
            }
            else{
                $this->iterator++;
                $data[] = array_combine($this->header, $row);
                if($this->iterator != 0 && $this->iterator % 1000 == 0){
                    $is_mul_1000 = true;
                    $chunk = $data;
                    $data = array();
                    yield $chunk;
                }
            }
        }
        fclose($this->file);
        if(!$is_mul_1000){
            yield $data;
        }
        return;
    }
}

And for reading it, you can use this.

    $file = database_path('path/to/csvfile/XYZ.csv');
    $csv_reader = new readLargeCSV($file, ",");


    foreach($csv_reader->csvToArray() as $data){
     // you can do whatever you want with the $data.
    }

Here $data contains the 1000 entries from the csv or n%1000 which will be for the last batch.

A detailed explanation for this can be found here https://medium.com/@aashish.gaba097/database-seeding-with-large-files-in-laravel-be5b2aceaa0b

My advice would be to use fread. It may be a little slower, but you won't have to use all your memory... For instance :

//This use filesize($oldFile) memory
file_put_content($newFile, file_get_content($oldFile));
//And this 8192 bytes
$pNew=fopen($newFile, 'w');
$pOld=fopen($oldFile, 'r');
while(!feof($pOld)){
    fwrite($pNew, fread($pOld, 8192));
}

继续阅读：php

file_get_contents => PHP Fatal error: Allowed memory exhausted

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？