开发者

How to build a regular expression to parse comma separated values but ignore the comma with in the double quotes?

Example string:

2011-03-09,4919 1281 0410 9930,55107,SAZB2314,"John, Doe" ,1-888-888-4452 ext 1813

Need to mark all the commas but not the one w开发者_如何学运维ithin the double quotes.


You could use Text::CSV from CPAN.


Or use Text::CSV_XS, which does the same thing but is faster.


Use Data::Record.


If you need a regex and not a proper parser like @eugene y suggests, here is one attempt. The captures should return the list elements in order.

(?:(?:([^"]*?|".*?"),)*([^"]*?|".*?"))?


Try:

use strict;
use warnings;
use Text::ParseWords;

while (<DATA>) {
    chomp;
    my @f = quotewords ',', 0, $_;
    for (@f) {
            s/^\s*|\s*$//g;
            s/^/"/ && s/$/"/ if /,/;
    }
    print join (",", @f), "\n";
}

__DATA__
2011-03-09,4919 1281 0410 9930,55107,SAZB2314,"John, Doe" ,1-888-888-4452 ext 1813
"ashish", "kumar", "test,1", "test2"
"foo", "b,ar", "msg1", "msg2"


I am currently working on a project and this regular expression helped me for a csv file in exactly the same format.

("([^"]*)",?)|(([^",]*),?)

This will not work if a single record is randomly broken into multiple lines. I had this issue and solved it by ascertaining whether the count of non-empty matches was correct.


I know how to do this in Java. Regular expressions might work different in PERL but let me show the idea. There is a union of 3 clauses.

// 1) select any quoted text before comma
// if it fails then
// 2) select any text before comma
// if it also fails then
// 3) select any text before end of the input

final String OR           = "|";
final String QUOTE        = "\"[\\s]*"; //with trailing whitespaces
final String NON_QUOTES   = "[^\"]*";
final String COMMA        = ",";
final String NON_COMMA    = "[^,]*"; 
final String NON_END      = "[^$]+"; 
final String END          = "$";

final Pattern p = Pattern.compile(
QUOTE+NON_QUOTES+QUOTE+COMMA+
OR+
NON_COMMA+COMMA+
OR+
NON_END+END);

It will give you matches that unfortunately will be ended by comma except the last one. There is no capturing groups because it does not make sense to define them with the union clause like this.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜