开发者

Build flat table from CSV files

I have 500 CSV files in this format:

IndicatorA_Name.csv

        1900    1901    1902 ...
Norway  3      2       
Sweden  1      3       3
Denmark 5      2       3    
... 

IndicatorB_Name.csv

        1900    1901    1902 ...
Norway  1      3       4
Sweden  1      2       
Iceland 1      6       3    
... 
  • Years in columns, countries in rows.
  • Notice that countries, years and values may differ between files.

I'd like to run through all these files and make a flat table (CSV file) with this structure:

country, year, IndicatorA_Name, IndicatorB_Name, ...
Sweden, 1900, 1, 1
Sweden, 1901, 3, 2
Norway, 1900, 3, 1
...

Preferably 开发者_如何学Goin PHP or JavaScript but I'm willing to learn something new.


You should probably code something like following code:

    $file = file_get_contents('file.csv');
    $lines = explode("\n", $file); //lines
    $years = explode(";", $lines[0]); //first line is years, so it gives us array of years
    for($i = 1, $c = count($lines)-1; $i < $c; ++$i){ //iterate over lines (excluding years)
        $lineData = explode(';', $lines[$i]); //array from line
        $country = $lineData[0]; //first line entry is country
        unset($lineData[0]); 
        $indicators = $lineData; //and the rest are indicators
        query('INSERT INTO data(country, year, IndicatorA_Name) VALUES(?,?,?)', $country, $year, $indicators[0]);
    }


I'd suggest using fgetcsv (see link for a usage example) or str_getcsv (with "\t" as the delimiter as Czechnology suggests).

That way you automatically support edge-cases like embedded delimiters (eg a comma in a field in a comma-separated file). It's generally best not to re-invent the wheel.


Use

$lines = explode(PHP_EOL, $csv);
$data = array();
foreach ($lines as $line)
  $data[] = explode("\t", $line);

(if it's tab separated like it looks in your example) and run throu it with two loops.

EDIT

Here's a tested code:

$csv1 = <<<TXT
        1900    1901    1902
Norway  3   2   
Sweden  1   3   3
Denmark 5   2   3
TXT;
$csv2 = <<<TXT
        1900    1901    1902
Norway  1   3   4
Sweden  1   2   
Iceland 1   6   3    
TXT;

$csvs = array(
  'IndicatorA_Name' => $csv1,
  'IndicatorB_Name' => $csv2);
/* of course, if you're pulling this from csv files, 
   you need to modify it accordingly, e.g.

$files = array('IndicatorA_Name', 'IndicatorB_Name', ...);
$csvs = array();
foreach ($files as $f)
  $csvs[] = file_get_contents($path . '/' . $f . '.csv');

   or use file(), then you don't need the first `explode` line later */


$data = array();
foreach ($csvs as $indicator => $csv) {
  $lines = explode(PHP_EOL, $csv);

  $header = explode("\t", array_shift($lines));
  foreach ($lines as $line) {
    $fields = explode("\t", $line);

    for ($i = 1; $i < count($fields); $i++) {
      $data[$fields[0]][$header[$i]][$indicator] = $fields[$i];
    }
  }
}

$rows = array();
foreach ($data as $country => $years) {
  foreach ($years as $year => $values) {
    $str = sprintf(PHP_EOL."('%s', '%d'", mysql_real_escape_string($country), intval($year));

    foreach (array_keys($csvs) as $indicator) {
      if (isset($values[$indicator]))
        $str .= sprintf(", '%s'", mysql_real_escape_string(trim($values[$indicator])));
      else
        $str .= ", ''";
    }
    $rows[] = $str . ")";
  }
}

$sql = "INSERT INTO table_name (".implode(',', array_keys($csvs)).") VALUES ".
       implode(',', $rows);

$sql is now

INSERT INTO table_name (IndicatorA_Name,IndicatorB_Name) VALUES 
('Norway', '1900', '3', '1'),
('Norway', '1901', '2', '3'),
('Norway', '1902', '', '4'),
('Sweden', '1900', '1', '1'),
('Sweden', '1901', '3', '2'),
('Sweden', '1902', '3', ''),
('Denmark', '1900', '5', ''),
('Denmark', '1901', '2', ''),
('Denmark', '1902', '3', ''),
('Iceland', '1900', '', '1'),
('Iceland', '1901', '', '6'),
('Iceland', '1902', '', '3')
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜