Am I taking the proper approach to dealing with these files? (CSV with PHP)
I am a student working on a placement for the summer. I have been given the task of dealing with data entry from excel to a SQL Server database for surveys that were carried out over a number of years. The task is outlined below:
There are three tables, a main event, an individual event and an individual. An event has many individual events, an individual event has many individuals. My code regards just the last two tables.
I read two files, a list of all individual events in one file, and a list of all individuals in the other. The individual's data tells me what individual event it is associated with.
My code basically reads an individual event, then looks through the second file for any associated individuals. For each line in the individuals file, if it is associated, it is inserted to the proper table, else it is written to a new file. Once the whole file is traversed, the new file is copied to the old file, thus removing data already entered to the database.
This copying across has knocked a good 3 minutes of execution time off simply re-reading the full individuals file again and again. But is there a better approach to this? My execution time for my sample data is ~47 seconds...ideally I'd like that lower.
Any advice, regardless how trivial would be appreciated.
EDIT: This is a cut down version of the code I am using
<?php
//not shown:
//connect to database
//input event data
//get the id of the event
//open files
$s_handle = fopen($_FILES['surveyfile']['tmp_name'],'r');//open survey file
copy($_FILES['cocklefile']['tmp_name'],'file1.csv');//make copy of the cockle file
//read files
$s_csv = fgetcsv($s_handle,'0',',');
//read lines and print lines
// then input data via sql
while (! feof($s_handle))
{
$max_index = count($s_csv);
$s_csv[$max_index]='';
foreach($s_csv as $val)
{
if(!isset($val))
$val = '';
}
$grid_no = $s_csv[0];
$sub_loc = $s_csv[1];
/*
.define more variables
.*/
$sql = "INSERT INTO indipendant_event"
."(parent_id,grid_number,sub_location,....)"
."VALUES ("
."'{$event_id}',"
."'{$grid_no}',"
//...
.");";
if (!odbc_exec($con,$sql))
{
echo "WARNING: SQL INSERT INTO fssbur.cockle_quadrat FAILED. PHP.";
}
//get ID
$sql = "SELECT MAX(ind_event_id)"
."FROM independant_event";
$return = odbc_exec($con,$sql);
$ind_event_id = odbc_result($return, 1);
//insert individuals
$c_2 = fopen('file2.csv','w');//create file c_2 to write to
$c_1 = fopen('file1.csv','r');//open the data to read
$c_csv = fgetcsv($c_1,'0',',');//get the first line of data
while(! feof($c_1))
{
for($i=0;$i<9;$i++)//make sure theres a value in each column
{
if(!isset($c_csv[$i]))
$c_csv[$i] = '';
}
//give values meaningful names
$sta开发者_如何学Ct_no = $c_csv[0];
$sample_method = $c_csv[1];
//....
//check whether the current line corresponds to the current station
if (strcmp(strtolower($stat_no),strtolower($grid_no))==0)
{
$sql = "INSERT INTO fssbur2.cockle"
."(parent_id,sampling_method,shell_height,shell_width,age,weight,alive,discarded,damage)"
."VALUES("
."'{$ind_event_id}',"
."'{$sample_method}',"
//...
."'{$damage}');";
//write data if it corresponds
if (!odbc_exec($con,$sql))
{
echo "WARNING: SQL INSERT INTO fssbur.cockle FAILED. PHP.";
}
$c_csv = fgetcsv($c_1,'0',',');
}
else//no correspondance
{
fputcsv($c_2,$c_csv);//write line to the new file
$c_csv = fgetcsv($c_1,'0',',');//get new line
continue;//rinse and repeat
}
}//end while, now gone through all individuals, and filled c_2 with the unused data
fclose($c_1);//close files
fclose($c_2);
copy('file2.csv','file1.csv');//copy new file to old, removing used data
$s_csv = fgetcsv($s_handle,'0',',');
}//end while
//close file
fclose($s_handle);
?>
I may not have fully understood the process but why not just insert the entire CSV into your database table. This might seem like wasted work but it will likely pay off. Once you have done your initial import, finding any individual associated with an event should be much faster as the DBMS will be able to use an index to speed up these lookups (compared to your file based linear traversal). To be precise: your "individual" table will presumably have a foreign key into your "individual_event" table. As long as you create an index on this foreign key, lookups will be significantly faster (it's possible that simply declaring this field to be a foreign key will cause SQL server to automatically index it but I can't say for sure, I don't really use MSSQL).
As an aside, how many records are we talking about? If we are dealing with 1000s of records, it's definitely reasonable to expect this type of thing to run in a couple of seconds.
You can create a temporary database with the data from the files and then use the temporary database/tables to bring the data into the new form. This probably works faster especially if you need to do lookups and you need to flag entries as processed.
精彩评论