Using regular expressions to split() values in a CSV file
I have a CSV file which I am parsing.
I am using split() to split the columns up by their commas.
The problem is that it is splitting columns that contain commas within the field.
The solution is to use a regular expression in the split to disregard commas with a space after them (EG: ", ") and only split commas with no trailing space (EG: ",").
Right now my spl开发者_运维知识库it looks like this:
$div = ',';
split('$div',$line);
How would I modify my split() call?
To parse a complete and valid CSV file with PHP you just need:
$data = array_map("str_getcsv", file($fn));
But if your file format is really not consistent, then you would indeed need the manual split method and a more specific regex.
preg_split('/,(?!\s)/', $line)
would be the regex you can use to match commans that are not followed by a space. Note that you need to use preg_split
from the PCRE library, and not the older split
call.
The CSV file's fields (especially if fields have commas in them) should be encapsulated in quotes:
"A","B1,B2","C","D"
If they are not, then that ambiguity is your first problem:
A,B1,B2,C,D
has five fields, and there's nothing you can do about it1.
When you have your source data sorted out, use fgetcsv
to parse it.
1 If this is really true:
The solution is to use a regular expression in the split to disregard commas with a space after them (EG: ", ") and only split commas with no trailing space (EG: ",").
that all your "internal" commas have spaces after them, then you could run a pre-processing step, replacing all ,<space>
with \,
. Escaping the commas within CSV resolves the ambiguity:
A,B1\,B2,C,D
I have a CSV file which I am parsing.
You're reinventing the wheel: PHP has fine methods of accomplishing this by itself, namely fgetcsv:
if (($handle = fopen("test.csv", "r")) !== FALSE) {
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
$num = count($data);
echo "<p> $num fields in line $row: <br /></p>\n";
$row++;
for ($c=0; $c < $num; $c++) {
echo $data[$c] . "<br />\n";
}
}
fclose($handle);
}
Always use them as a string. like this
$outstr .='"'.$line->linename.'",';
精彩评论