开发者

Using regular expressions to split() values in a CSV file

I have a CSV file which I am parsing.

I am using split() to split the columns up by their commas.

The problem is that it is splitting columns that contain commas within the field.

The solution is to use a regular expression in the split to disregard commas with a space after them (EG: ", ") and only split commas with no trailing space (EG: ",").

Right now my spl开发者_运维知识库it looks like this:

$div = ',';
split('$div',$line);

How would I modify my split() call?


To parse a complete and valid CSV file with PHP you just need:

$data = array_map("str_getcsv", file($fn));

But if your file format is really not consistent, then you would indeed need the manual split method and a more specific regex.

preg_split('/,(?!\s)/', $line)

would be the regex you can use to match commans that are not followed by a space. Note that you need to use preg_split from the PCRE library, and not the older split call.


The CSV file's fields (especially if fields have commas in them) should be encapsulated in quotes:

 "A","B1,B2","C","D"

If they are not, then that ambiguity is your first problem:

 A,B1,B2,C,D

has five fields, and there's nothing you can do about it1.

When you have your source data sorted out, use fgetcsv to parse it.


1 If this is really true:

The solution is to use a regular expression in the split to disregard commas with a space after them (EG: ", ") and only split commas with no trailing space (EG: ",").

that all your "internal" commas have spaces after them, then you could run a pre-processing step, replacing all ,<space> with \,. Escaping the commas within CSV resolves the ambiguity:

A,B1\,B2,C,D


I have a CSV file which I am parsing.

You're reinventing the wheel: PHP has fine methods of accomplishing this by itself, namely fgetcsv:

if (($handle = fopen("test.csv", "r")) !== FALSE) {
    while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
        $num = count($data);
        echo "<p> $num fields in line $row: <br /></p>\n";
        $row++;
        for ($c=0; $c < $num; $c++) {
            echo $data[$c] . "<br />\n";
        }
    }
    fclose($handle);
}


Always use them as a string. like this

$outstr .='"'.$line->linename.'",';

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜