Achieving an optimum 'fread' chunk size?
Alright, I know my question is not entirely specific, as an op开发者_开发技巧timum fread chunk size is more of a trial error based thing. However, I was hoping some of you guys could shed some light on this.
This also involves server related stuff, so am not sure if Stackoverflow is entirely the right place, but it did seem to be a better choice in comparison to ServerFault.
To begin with, I'm going to post two screenshots:
http://screensnapr.com/e/pnF1ik.png
http://screensnapr.com/e/z85FWG.png
Now I've got a script that uses PHP to stream files to the end user. It uses fopen and fread to stream the file. Most of these files are larger than 100MB. My concern is that sometimes, the above is what my server stats turn into. The two screens are from different servers; both servers are dedicated file streaming boxes. Nothing else runs on them except PHP streaming the file to the end user.
I'm confused about the fact that even when my server's are only transmitting an aggregate total of about 4MB/sec of data to the end client(s), the disk reads are going at 100M/s and over. This insane level of IO eventually locks my CPU because it waits for IO and tasks pile up; eventually my server becomes completely unresponsive, requiring a reboot.
My current fread chunk size is set to 8 * 1024. My question is, will changing the block size and experimenting help at all? The client is only downloading data at an average ~4MB/sec. So why is the disk reading data at 100MB/sec? I've tried every possible solution on the server end; I even swapped the disks with new ones to rule out a potential disk issue. Looks to me like this is a script issue; maybe PHP is reading the entire data from the disk regardless of how much it transfers to the end client?
Any help at all would be appreciated. And if this belongs to ServerFault, then my apologies for posting here. And if you guys need me to post snippets from the actual script, I can do that too.
8 * 1024 bytes? That seems perfectly reasonable and if so your high disk I/O is probably related to concurrent request. Have you considered implementing some sort of bandwidth throttling? Here is a PHP-only implementation I did for my framework, phunction:
public static function Download($path, $speed = null, $multipart = false)
{
if (strncmp('cli', PHP_SAPI, 3) !== 0)
{
if (is_file($path) === true)
{
while (ob_get_level() > 0)
{
ob_end_clean();
}
$file = @fopen($path, 'rb');
$size = sprintf('%u', filesize($path));
$speed = (empty($speed) === true) ? 1024 : floatval($speed);
if (is_resource($file) === true)
{
set_time_limit(0);
session_write_close();
if ($multipart === true)
{
$range = array(0, $size - 1);
if (array_key_exists('HTTP_RANGE', $_SERVER) === true)
{
$range = array_map('intval', explode('-', preg_replace('~.*=([^,]*).*~', '$1', $_SERVER['HTTP_RANGE'])));
if (empty($range[1]) === true)
{
$range[1] = $size - 1;
}
foreach ($range as $key => $value)
{
$range[$key] = max(0, min($value, $size - 1));
}
if (($range[0] > 0) || ($range[1] < ($size - 1)))
{
ph()->HTTP->Code(206, 'Partial Content');
}
}
header('Accept-Ranges: bytes');
header('Content-Range: bytes ' . sprintf('%u-%u/%u', $range[0], $range[1], $size));
}
else
{
$range = array(0, $size - 1);
}
header('Pragma: public');
header('Cache-Control: public, no-cache');
header('Content-Type: application/octet-stream');
header('Content-Length: ' . sprintf('%u', $range[1] - $range[0] + 1));
header('Content-Disposition: attachment; filename="' . basename($path) . '"');
header('Content-Transfer-Encoding: binary');
if ($range[0] > 0)
{
fseek($file, $range[0]);
}
while ((feof($file) !== true) && (connection_status() === CONNECTION_NORMAL))
{
ph()->HTTP->Flush(fread($file, round($speed * 1024)));
ph()->HTTP->Sleep(1);
}
fclose($file);
}
exit();
}
else
{
ph()->HTTP->Code(404, 'Not Found');
}
}
return false;
}
The above method has some minor dependencies and it adds some unnecessary functionality, like multi-part downloads but you should be able to reuse the throttling logic without problems.
// serve file at 4 MBps (max)
Download('/path/to/file.ext', 4 * 1024);
You can even be more generous by default and decrease the $speed
depending on the values you get from the first index of sys_getloadavg()
to avoid stressing your CPU.
Generally, it can happen that actual I/O is faster than userspace I/O because of prefetching and filesystem overhead. However, that should never lock your server. The cache size will have little to no impact on that as long as it's between a 1KiB and, say, 16MiB. However, instead of using php to stream files, you should really consider the much more optimized readfile.
That being said, barring a serious programming error, this behavior is probably not directly related to your small loop. First, you should use iotop to find out which program is actually causing the I/O. If it's php (how many concurrent scripts? Sorry, the screenshots seem to be completely garbled and show next to no useful information), rule out you're using output buffering and have a look at memory consumption as well as the various php tuning parameters (phpinfo has a good overview). By the way, htop is a way nicer alternative to top ;).
Now I've got a script that uses PHP to stream files to the end user.
Just to clarify whats really going on, Apache is responsible for the actual "stream". PHP deals directly with Apache for it's output. Therefore your end user for the PHP script is Apache. Apache then handles the output to the user, which apparently in your case is around ~4MB/sec. Apache however does not have that restriction and can take all of your output at once and then handle a delayed delivery to the client. To prove this you should be able to see your script exit before the stream is delivered. If your script turns around and tries to deliver another file, then your queing up Apache against your server resources.
A better solution may be to allow Apache to handle the file delivery completely by letting the user request the download from an accesible url. Obviously this is limited to static content. To fix your above script, would require delaying some of the file read to allow Apache to deliver the chunks instead of buffering the whole output.
EDIT: If your memory is fine and we can rule out swap drive activity, then it may simply be concurrent file read requests. If we request 5 files at 100mb, well thats 500mb of read activity. Apache will not throttle your script and will in fact buffer all output, which can be over 100mb at a time. This would account for alot of disk i/o activity, because each request results in reading the complete file into the buffer. Utilizing a throttle as suggested by Alix would allow for more concurrent requests, but eventually your going to reach a limit. We can't be sure how fast the user receives the data from Apache, so you might have to find a nice balance for the throttle size to allow for Apache and PHP to work with chunks of your files instead of the whole file.
精彩评论