Build flat table from CSV files
I have 500 CSV files in this format:
IndicatorA_Name.csv
1900 1901 1902 ... Norway 3 2 Sweden 1 3 3 Denmark 5 2 3 ...
IndicatorB_Name.csv
1900 1901 1902 ... Norway 1 3 4 Sweden 1 2 Iceland 1 6 3 ...
- Years in columns, countries in rows.
- Notice that countries, years and values may differ between files.
I'd like to run through all these files and make a flat table (CSV file) with this structure:
country, year, IndicatorA_Name, IndicatorB_Name, ... Sweden, 1900, 1, 1 Sweden, 1901, 3, 2 Norway, 1900, 3, 1 ...
Preferably 开发者_如何学Goin PHP or JavaScript but I'm willing to learn something new.
You should probably code something like following code:
$file = file_get_contents('file.csv');
$lines = explode("\n", $file); //lines
$years = explode(";", $lines[0]); //first line is years, so it gives us array of years
for($i = 1, $c = count($lines)-1; $i < $c; ++$i){ //iterate over lines (excluding years)
$lineData = explode(';', $lines[$i]); //array from line
$country = $lineData[0]; //first line entry is country
unset($lineData[0]);
$indicators = $lineData; //and the rest are indicators
query('INSERT INTO data(country, year, IndicatorA_Name) VALUES(?,?,?)', $country, $year, $indicators[0]);
}
I'd suggest using fgetcsv
(see link for a usage example) or str_getcsv
(with "\t"
as the delimiter as Czechnology suggests).
That way you automatically support edge-cases like embedded delimiters (eg a comma in a field in a comma-separated file). It's generally best not to re-invent the wheel.
Use
$lines = explode(PHP_EOL, $csv);
$data = array();
foreach ($lines as $line)
$data[] = explode("\t", $line);
(if it's tab separated like it looks in your example) and run throu it with two loops.
EDIT
Here's a tested code:
$csv1 = <<<TXT
1900 1901 1902
Norway 3 2
Sweden 1 3 3
Denmark 5 2 3
TXT;
$csv2 = <<<TXT
1900 1901 1902
Norway 1 3 4
Sweden 1 2
Iceland 1 6 3
TXT;
$csvs = array(
'IndicatorA_Name' => $csv1,
'IndicatorB_Name' => $csv2);
/* of course, if you're pulling this from csv files,
you need to modify it accordingly, e.g.
$files = array('IndicatorA_Name', 'IndicatorB_Name', ...);
$csvs = array();
foreach ($files as $f)
$csvs[] = file_get_contents($path . '/' . $f . '.csv');
or use file(), then you don't need the first `explode` line later */
$data = array();
foreach ($csvs as $indicator => $csv) {
$lines = explode(PHP_EOL, $csv);
$header = explode("\t", array_shift($lines));
foreach ($lines as $line) {
$fields = explode("\t", $line);
for ($i = 1; $i < count($fields); $i++) {
$data[$fields[0]][$header[$i]][$indicator] = $fields[$i];
}
}
}
$rows = array();
foreach ($data as $country => $years) {
foreach ($years as $year => $values) {
$str = sprintf(PHP_EOL."('%s', '%d'", mysql_real_escape_string($country), intval($year));
foreach (array_keys($csvs) as $indicator) {
if (isset($values[$indicator]))
$str .= sprintf(", '%s'", mysql_real_escape_string(trim($values[$indicator])));
else
$str .= ", ''";
}
$rows[] = $str . ")";
}
}
$sql = "INSERT INTO table_name (".implode(',', array_keys($csvs)).") VALUES ".
implode(',', $rows);
$sql is now
INSERT INTO table_name (IndicatorA_Name,IndicatorB_Name) VALUES
('Norway', '1900', '3', '1'),
('Norway', '1901', '2', '3'),
('Norway', '1902', '', '4'),
('Sweden', '1900', '1', '1'),
('Sweden', '1901', '3', '2'),
('Sweden', '1902', '3', ''),
('Denmark', '1900', '5', ''),
('Denmark', '1901', '2', ''),
('Denmark', '1902', '3', ''),
('Iceland', '1900', '', '1'),
('Iceland', '1901', '', '6'),
('Iceland', '1902', '', '3')
精彩评论