Simple perl split() and Regular Expression question [duplicate]
Possible Duplicate:
How can I parse quoted CSV in Perl with a regex?
I am attempting to take a CSV file and import each row into an array (where each element represents a column). The format of a CSV file is very simple:
item1,item2,item3
nextrowitem1,item2,item3
"items,with,commas","are,in,quotes"
I imported the CSV file using:
open(FILE, "test.csv");
@lines = <FILE>;
Then I looped through it using:
foreach(@lines){
@items = split(/regular expression/);
/*Do stuff with @items array*/
}
(Note that you do not need to use split(/regular expressi开发者_JAVA百科on, $string);
because split()
assumes $_
if no string is supplied)
Before I tested the file using a CSV file where none of the items contained commas and the simple regular expression of split(/,/)
. This worked just fine, so there is nothing wrong with the file, reading it, or my loop after this regular expression. However when I hit items that contained a comma they got understandably divided like so:
1 => "items
2 => with
3 => commas"
4 => "are
5 => in
6 => quotes"
Instead of the desired:
1 => items,with,commas
2 => are,in,quotes
Can anyone help me develop a regular expression to split this array correctly? Basically if the item starts with a quote ("
), it needs to wait until ","
to split. If the item does not start with a quote, it needs to wait until ,
to split.
Try reading Text::CSV as a possible option that already does this. The problem with doing parsing of a CSV into a regular expression is that you have to look for things like ","
(which you indicated) as well as just a ,
separation.
Just use Text::CSV_XS instead...
See my post that solves this problem for more detail.
^(?:(?:"((?:""|[^"])+)"|([^,]*))(?:$|,))+$
Will match the whole line, then you can use the matched captures to get your data out (without the quotes).
精彩评论