Reading a large CSV file in PHP
I have a very large CSV file. 51427 lines to be exact.
Is there not a way where I can only read the required lines into an array? That would speed things up 开发者_开发百科significantly.
You might want to look at streaming the csv file. Send start file location, start position and number of bytes to read as get paramters to a ProgressiveReader.php
class NoFileFoundException extends Exception {
function __toString() {
return '<h1><b>ERROR:</b> could not find ('
.$this->getMessage().
') please check your settings.</h1>';
}
}
class NoFileOpenException extends Exception {
function __toString() {
return '<h1><b>ERROR:</b> could not open ('
.$this->getMessage().
') please check your settings.</h1>';
}
}
interface Reader {
function setFileName($fName);
function open();
function setBufferOffset($offset);
function bufferSize();
function isOffset();
function setPacketSize($size);
function read();
function isEOF();
function close();
function readAll();
}
class ProgressiveReader implements Reader {
private $fName;
private $fileHandler;
private $offset = 0;
private $packetSize = 0;
public function setFileName($fName) {
$this->fName = $fName;
if(!file_exists($this->fName)) {
throw new NoFileFoundException($this->fName);
}
}
public function open() {
try {
$this->fileHandler = fopen($this->fName, 'rb');
}
catch (Exception $e) {
throw new NoFileOpenException($this->fName);
}
fseek($this->fileHandler, $this->offset);
}
public function setBufferOffset($offset) {
$this->offset = $offset;
}
public function bufferSize() {
return filesize($this->fName) - (($this->offset > 0) ? ($this->offset + 1) : 0);
}
public function isOffset() {
if($this->offset === 0) {
return false;
}
return true;
}
public function setPacketSize($size) {
$this->packetSize = $size;
}
public function read() {
return fread($this->fileHandler, $this->packetSize);
}
public function isEOF() {
return feof($this->fileHandler);
}
public function close() {
if($this->fileHandler) {
fclose($this->fileHandler);
}
}
public function readAll() {
return fread($this->fileHandler, filesize($this->fName));
}
}
Here are the unit tests:
require_once 'PHPUnit/Framework.php';
require_once dirname(__FILE__).'/../ProgressiveReader.php';
class ProgressiveReaderTest extends PHPUnit_Framework_TestCase {
protected $reader;
private $fp;
private $fname = "Test.txt";
protected function setUp() {
$this->createTestFile();
$this->reader = new ProgressiveReader();
}
protected function tearDown() {
$this->reader->close();
}
public function test_isValidFile() {
$this->reader->setFileName($this->fname);
}
public function test_isNotValidFile() {
try {
$this->reader->setFileName("nothing.tada");
}
catch (Exception $e) {
return;
}
$this->fail();
}
public function test_isFileOpen() {
$this->reader->setFileName($this->fname);
$this->reader->open();
}
public function test_couldNotOpenFile() {
$this->reader->setFileName($this->fname);
try {
$this->deleteTestFile();
$this->reader->open();
}
catch (Exception $e) {
return;
}
$this->fail();
}
public function test_bufferSizeZeroOffset() {
$this->reader->setFileName($this->fname);
$this->reader->open();
$this->assertEquals($this->reader->bufferSize(), 12);
}
public function test_bufferSizeTwoOffset() {
$this->reader->setFileName($this->fname);
$this->reader->setBufferOffset(2);
$this->reader->open();
$this->assertEquals($this->reader->bufferSize(), 9);
}
public function test_readBuffer() {
$this->reader->setFileName($this->fname);
$this->reader->setBufferOffset(0);
$this->reader->setPacketSize(1);
$this->reader->open();
$this->assertEquals($this->reader->read(), "T");
}
public function test_readBufferWithOffset() {
$this->reader->setFileName($this->fname);
$this->reader->setBufferOffset(2);
$this->reader->setPacketSize(1);
$this->reader->open();
$this->assertEquals($this->reader->read(), "S");
}
public function test_readSuccesive() {
$this->reader->setFileName($this->fname);
$this->reader->setBufferOffset(0);
$this->reader->setPacketSize(6);
$this->reader->open();
$this->assertEquals($this->reader->read(), "TEST1\n");
$this->assertEquals($this->reader->read(), "TEST2\n");
}
public function test_readEntireBuffer() {
$this->reader->setFileName($this->fname);
$this->reader->open();
$this->assertEquals($this->reader->readAll(), "TEST1\nTEST2\n");
}
public function test_isNotEOF() {
$this->reader->setFileName($this->fname);
$this->reader->setBufferOffset(2);
$this->reader->setPacketSize(1);
$this->reader->open();
$this->assertFalse($this->reader->isEOF());
}
public function test_isEOF() {
$this->reader->setFileName($this->fname);
$this->reader->setBufferOffset(0);
$this->reader->setPacketSize(15);
$this->reader->open();
$this->reader->read();
$this->assertTrue($this->reader->isEOF());
}
public function test_isOffset() {
$this->reader->setFileName($this->fname);
$this->reader->setBufferOffset(2);
$this->assertTrue($this->reader->isOffset());
}
public function test_isNotOffset() {
$this->reader->setFileName($this->fname);
$this->assertFalse($this->reader->isOffset());
}
private function createTestFile() {
$this->fp = fopen($this->fname, "wb");
fwrite($this->fp, "TEST1\n");
fwrite($this->fp, "TEST2\n");
flush();
fclose($this->fp);
}
private function deleteTestFile() {
if(file_exists($this->fname)) {
unlink($this->fname);
}
}
}
Can you connect to the database server directly?
If so, I would consider using a 3rd party program like SQLyog to import your csv.
You could also upload the file and use the mysql shell to import that data directly:
LOAD DATA INFILE '/path/to/your_file.csv' INTO TABLE table_name FIELDS TERMINATED BY ',';
This reads the entire CSV file into an array
All of the 50000+ lines?
Advance to the start of the wanted block of the file from PHP by reading line by line (fgets()) and then add each (needed) line to the array; you can get the array of the line with fgetcsv().
edit: I don't know the exact details, but I sense that reading everything into a data structure costs way more than reading only what we need.
Have you tried using bash/shell(if you're on linux) to import your csv into mysql? You can also use ruby or perl or whatnot, as I think that's what you should use instead of php(or any web app) to import the file.
I would suggest using the fast MySQL LOAD DATA INFILE command:
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
If that's not an option, you could split the CSV file (assuming access to a shell).
Your script probably takes too long and it gets terminated.
You should look for max_execution_time
directive in php.ini
and set it to something that suits you.
The default max_execution_time
is set to 30 seconds, so your script probably gets terminated.
If you also have scripts that need to be limited in time, you can do that individually by calling set_time_init();
Bah! Ignore this answer. Was a duplicate. See Scorchio's mention of fgetcsv() above.
精彩评论