开发者

Fastest way to calculate the size of an file opened inside the code (PHP)

I know there quite a bit of in-built functions available in PHP to get size of the file, some of them are: filesize, stat, ftell, etc.

My question lies around ftell which is quite interesting, it returns you the integer value of the 开发者_运维技巧file-pointer from the file.

Is it possible to get the size of the file using ftell function? If yes, then tell me how?

Scenario:

  1. System (code) opens a existing file with mode "a" to append the contents.
  2. File pointer points to the end of line.
  3. System updates the content into the file.
  4. System uses ftell to calculate the size of the file.


fstat determines the file size without any acrobatics:

$f = fopen('file', 'r+');
$stat = fstat($f);
$size = $stat['size'];

ftell can not be used when the file has been opened with the append("a") flag. Also, you have to seek to the end of the file with fseek($f, 0, SEEK_END) first.


ftell() can tell you how many bytes are supposed to be in the file, but not how many actually are. Sparse files take up less space on disk than the value seeking to the end and telling will return.


I wrote a benchmark to improve this topic, and to avoid people arguing there's some kind of php/cache, I create unique files in another process.

This is a new benchmark I did to remain no doubt.

Tests ignore fopen and close time, since user asks the fastest way to calculate the size of an already opened file. Each test is run with 200 files.

The code which creates files in a separate process is the first comment of this post.

<?php
class timeIt
{
    static private $times   = [];
    static function new()
    {
        self::$times[] = hrtime(true);
    }
    static function stop()
    {
        self::$times[] = -1;
    }
    static function dif()
    {
        $dif    = 0;
        $sum    = 0;
        $i      = count(self::$times) - 1;

        if (self::$times[$i] === -1)
            unset(self::$times[$i--]);
        
        for ($i = count(self::$times) - 1; $i > 0; --$i) {
            if (self::$times[$i - 1] === -1) {
                $sum    += $dif;
                $dif    = 0;
                --$i;
                continue;
            }
            $dif    += self::$times[$i] - self::$times[$i - 1];
        }
        return $sum + $dif;
    }
    static function printNReset()
    {
        echo "diffTime:" . self::dif() . "\n\n";
        self::reset();
    }
    static function reset()
    {
        self::$times    = [];
    }
}
function fseek_size_from_current($handle)
{
    $current  = ftell($handle);
    fseek($handle, 0, SEEK_END);
    $size   = ftell($handle);
    fseek($handle, $current);
    
    return $size;
}
function fseek_size_from_start($handle)
{
    fseek($handle, 0, SEEK_END);
    $size   = ftell($handle);
    fseek($handle, 0);
    
    return $size;
}

function uniqueProcessId()
{
    return (string) hrtime(true);
}

function getUniqueForeignProcessFiles($quantity, $size)
{
    $returnedFilenames   = $filenames = [];
    while ($quantity--){
        $filename   = uniqueProcessId();
        $filenames[$filename]   = $size;
        $returnedFilenames[]    = __DIR__ . DIRECTORY_SEPARATOR . $filename;
    }

    $data       = base64_encode(json_encode($filenames));
    $foreignCgi = __DIR__ . DIRECTORY_SEPARATOR . "createFileByNames.php";
    $command    = "php $foreignCgi $data";
    if (shell_exec($command) !== 'ok')
        die("An error ocurred");

    return $returnedFilenames;
}
const FILESIZE  = 20 * 1024 * 1024;

foreach(getUniqueForeignProcessFiles(200, FILESIZE) as $filename){
    $handle = fopen($filename, 'r');
    timeIt::new();
    $size   = fstat($handle)['size'];
    timeIt::new();
    timeIt::stop();
    fclose($handle);
    unlink($filename);
}
echo "**fstat**\n";
timeIt::printNReset();

foreach(getUniqueForeignProcessFiles(200, FILESIZE) as $filename){
    $handle = fopen($filename, 'r');
    timeIt::new();
    $size   = fseek_size_from_start($handle);
    timeIt::new();
    timeIt::stop();
    fclose($handle);
    unlink($filename);
}
echo "**fseek with static/defined**\n";
timeIt::printNReset();


foreach(getUniqueForeignProcessFiles(200, FILESIZE) as $filename){
    $handle = fopen($filename, 'r');
    timeIt::new();
    $size   = fseek_size_from_current($handle);
    timeIt::new();
    timeIt::stop();
    fclose($handle);
    unlink($filename);
}
echo "**fseek with current offset**\n";
timeIt::printNReset();


foreach(getUniqueForeignProcessFiles(200, FILESIZE) as $filename){
    $handle = fopen($filename, 'r');
    timeIt::new();
    $size   = filesize($filename);
    timeIt::new();
    timeIt::stop();
    fclose($handle);
    unlink($filename);
}
echo "**filesize after fopen**\n";
timeIt::printNReset();

foreach(getUniqueForeignProcessFiles(200, FILESIZE) as $filename){
    timeIt::new();
    $size   = filesize($filename);
    timeIt::new();
    timeIt::stop();
    unlink($filename);
}
echo "**filesize no fopen**\n";
timeIt::printNReset();

Results with 20MB files, times in nanoseconds

fstat diffTime:2745700

fseek with static/defined diffTime:1267400

fseek with current offset diffTime:983500

filesize after fopen diffTime:283052500

filesize no fopen diffTime:4259203800

Results with 1MB file, times in nanoseconds:

fstat diffTime:1490400

fseek with static/defined diffTime:706800

fseek with current offset diffTime:837900

filesize after fopen diffTime:22763300

filesize no fopen diffTime:216512800

Previously this answer had another benchmark, which I removed the algorithm to let this answer cleaner. That algorithm used file created by own process and the assumption was:

ftell + fseek is half the time of fstat['size'], even inside another function and calling both functions twice. fstat is slower because it has a lot more information than just the file size, so if you need the other infos alongside your code, to check for changes, just stick to fstat.

Current benchmark shows that assumption to be valid, which is: **fseek + ftell++ is 2-2.8x faster than fstat for files of 1-20MB.

Feel free to run your benchmarks and share your results.


Thanks @Phihag, with your info on fseek along with ftell I am able to calculate the size in a much better way. See the code here: http://pastebin.com/7XCqu0WR

<?php
$fp = fopen("/tmp/temp.rock", "a+");

fwrite($fp, "This is the contents");

echo "Time taken to calculate the size by filesize function: ";
$t = microtime(true);
$ts1 = filesize("/tmp/temp.rock") . "\n";
echo microtime(true) - $t . "\n";

echo "Time taken to calculate the size by fstat function:";
$t = microtime(true);
$ts1 = fstat($fp) . "\n";
$size = $ts1["size"];
echo microtime(true) - $t . "\n";

echo "Time taken to calculate the size by fseek and ftell function: ";
$t = microtime(true);
fseek($fp, 0, SEEK_END);
$ts2 = ftell($fp) . "\n";
echo microtime(true) - $t . "\n";

fclose($fp);

/**
OUTPUT:

Time taken to calculate the size by filesize function:2.4080276489258E-5
Time taken to calculate the size by fstat function:2.9802322387695E-5
Time taken to calculate the size by fseek and ftell function:1.2874603271484E-5

*/
?>
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜