Am I taking the proper approach to dealing with these files? (CSV with PHP)

2023-03-16 22:50 问答作者：

I am a student working on a placement for the summer. I have been given the task of dealing with data entry from excel to a SQL Server database for surveys that were carried out over a number of years. The task is outlined below:

There are three tables, a main event, an individual event and an individual. An event has many individual events, an individual event has many individuals. My code regards just the last two tables.

I read two files, a list of all individual events in one file, and a list of all individuals in the other. The individual's data tells me what individual event it is associated with.

My code basically reads an individual event, then looks through the second file for any associated individuals. For each line in the individuals file, if it is associated, it is inserted to the proper table, else it is written to a new file. Once the whole file is traversed, the new file is copied to the old file, thus removing data already entered to the database.

This copying across has knocked a good 3 minutes of execution time off simply re-reading the full individuals file again and again. But is there a better approach to this? My execution time for my sample data is ~47 seconds...ideally I'd like that lower.

Any advice, regardless how trivial would be appreciated.

EDIT: This is a cut down version of the code I am using

<?php
//not shown:
//connect to database 
//input event data
//get the id of the event
//open files
$s_handle = fopen($_FILES['surveyfile']['tmp_name'],'r');//open survey file
copy($_FILES['cocklefile']['tmp_name'],'file1.csv');//make copy of the cockle file
//read files
$s_csv = fgetcsv($s_handle,'0',',');

//read lines and print lines
// then input data via sql

while (! feof($s_handle))
{
    $max_index = count($s_csv);
    $s_csv[$max_index]='';
    foreach($s_csv as $val)
    {
        if(!isset($val))
        $val = '';
    }
    $grid_no = $s_csv[0];
    $sub_loc = $s_csv[1];
    /*
    .define more variables
    .*/
    

    $sql = "INSERT INTO indipendant_event" 
        ."(parent_id,grid_number,sub_location,....)"
        ."VALUES ("
        ."'{$event_id}',"
        ."'{$grid_no}',"
        //...
        .");";

    if (!odbc_exec($con,$sql))
    {
        echo "WARNING: SQL INSERT INTO fssbur.cockle_quadrat FAILED. PHP.";
    }
    //get ID
    $sql = "SELECT MAX(ind_event_id)"
    ."FROM independant_event";
    $return =  odbc_exec($con,$sql);
    $ind_event_id = odbc_result($return, 1);
    
    //insert individuals
    $c_2 = fopen('file2.csv','w');//create file c_2 to write to 
    $c_1 = fopen('file1.csv','r');//open the data to read
    $c_csv = fgetcsv($c_1,'0',',');//get the first line of data
    while(! feof($c_1))
    {
        
        for($i=0;$i<9;$i++)//make sure theres a value in each column
        {
            if(!isset($c_csv[$i]))
            $c_csv[$i] = '';
        }
        //give values meaningful names
        $sta开发者_如何学Ct_no = $c_csv[0];
        $sample_method = $c_csv[1];
        //....
        
        //check whether the current line corresponds to the current station
        if (strcmp(strtolower($stat_no),strtolower($grid_no))==0)
        {
            $sql = "INSERT INTO fssbur2.cockle"
                ."(parent_id,sampling_method,shell_height,shell_width,age,weight,alive,discarded,damage)"
                ."VALUES("
                ."'{$ind_event_id}',"
                ."'{$sample_method}',"
                //...
                ."'{$damage}');";
            //write data if it corresponds
            if (!odbc_exec($con,$sql))
            {
                echo "WARNING: SQL INSERT INTO fssbur.cockle FAILED. PHP.";
            }     
            $c_csv = fgetcsv($c_1,'0',',');  
        }
        else//no correspondance
        {
            fputcsv($c_2,$c_csv);//write line to the new file
            $c_csv = fgetcsv($c_1,'0',',');//get new line
            continue;//rinse and repeat
        }
    }//end while, now gone through all individuals, and filled c_2 with the unused data
    fclose($c_1);//close files
    fclose($c_2);
    copy('file2.csv','file1.csv');//copy new file to old, removing used data
    $s_csv = fgetcsv($s_handle,'0',',');
}//end while

//close file
fclose($s_handle);
?>

I may not have fully understood the process but why not just insert the entire CSV into your database table. This might seem like wasted work but it will likely pay off. Once you have done your initial import, finding any individual associated with an event should be much faster as the DBMS will be able to use an index to speed up these lookups (compared to your file based linear traversal). To be precise: your "individual" table will presumably have a foreign key into your "individual_event" table. As long as you create an index on this foreign key, lookups will be significantly faster (it's possible that simply declaring this field to be a foreign key will cause SQL server to automatically index it but I can't say for sure, I don't really use MSSQL).

As an aside, how many records are we talking about? If we are dealing with 1000s of records, it's definitely reasonable to expect this type of thing to run in a couple of seconds.

You can create a temporary database with the data from the files and then use the temporary database/tables to bring the data into the new form. This probably works faster especially if you need to do lookups and you need to flag entries as processed.

继续阅读：csv file php processing-efficiency

Am I taking the proper approach to dealing with these files? (CSV with PHP)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？