CSV Import Split by Comma - what to do about quotes?
I have a CSV file I'm importing but am running into an issue. The data is in the format:
TEST 690, "This is a test 1开发者_JAVA技巧, 2 and 3" ,$14.95 ,4
I need to be able to explode by the , that are not within the quotes...
See the fgetcsv function.
If you already have a string, you can create a stream that wraps it and then use fgetcsv
. See http://code.google.com/p/phpstringstream/source/browse/trunk/stringstream.php
If you really want to do this by hand, here's a rough reference implementation I wrote to explode a complete line of CSV text into an array. Be warned: This code does NOT handle multiple-line fields! With this implementation, the entire CSV row must exist on a single line with no line breaks!
<?php
//-----------------------------------------------------------------------
function csvexplode($str, $delim = ',', $qual = "\"")
// Explode a single CSV string (line) into an array.
{
$len = strlen($str); // Store the complete length of the string for easy reference.
$inside = false; // Maintain state when we're inside quoted elements.
$lastWasDelim = false; // Maintain state if we just started a new element.
$word = ''; // Accumulator for current element.
for($i = 0; $i < $len; ++$i)
{
// We're outside a quoted element, and the current char is a field delimiter.
if(!$inside && $str[$i]==$delim)
{
$out[] = $word;
$word = '';
$lastWasDelim = true;
}
// We're inside a quoted element, the current char is a qualifier, and the next char is a qualifier.
elseif($inside && $str[$i]==$qual && ($i<$len && $str[$i+1]==$qual))
{
$word .= $qual; // Add one qual into the element,
++$i; // Then skip ahead to the next non-qual char.
}
// The current char is a qualifier (so we're either entering or leaving a quoted element.)
elseif ($str[$i] == $qual)
{
$inside = !$inside;
}
// We're outside a quoted element, the current char is whitespace and the 'last' char was a delimiter.
elseif( !$inside && ($str[$i]==" ") && $lastWasDelim)
{
// Just skip the char because it's leading whitespace in front of an element.
}
// Outside a quoted element, the current char is whitespace, the "next" char is a delimiter.
elseif(!$inside && ($str[$i]==" ") )
{
// Look ahead for the next non-whitespace char.
$lookAhead = $i+1;
while(($lookAhead < $len) && ($str[$lookAhead] == " "))
{
++$lookAhead;
}
// If the next char is formatting, we're dealing with trailing whitespace.
if($str[$lookAhead] == $delim || $str[$lookAhead] == $qual)
{
$i = $lookAhead-1; // Jump the pointer ahead to right before the delimiter or qualifier.
}
// Otherwise we're still in the middle of an element, so add the whitespace to the output.
else
{
$word .= $str[$i];
}
}
// If all else fails, add the character to the current element.
else
{
$word .= $str[$i];
$lastWasDelim = false;
}
}
$out[] = $word;
return $out;
}
// Examples:
$csvInput = 'Name,Address,Phone
Alice,123 First Street,"555-555-5555"
Bob,"345 Second Place, City ST",666-666-6666
"Charlie ""Chuck"" Doe", 3rd Circle ," 777-777-7777"';
// explode() emulates file() in this context.
foreach(explode("\n", $csvInput) as $line)
{
var_dump(csvexplode($line));
}
?>
I would still recommend relying on PHP's built-in function though. That's (hopefully) going to be far more reliable long term. Artefacto and Roadmaster are right.: anything you have to do to the data is best done after you import it.
精彩评论