开发者

Split on comma, but only when not in parenthesis

I am trying to do a split on a string with comma delimiter

my $string='ab,12,20100401,xyz(A,B)';
my @array=split(',',$str开发者_StackOverflow社区ing);

If I do a split as above the array will have values

ab
12
20100401
xyz(A,
B)

I need values as below.

ab
12
20100401
xyz(A,B) 

(should not split xyz(A,B) into 2 values) How do I do that?


use Text::Balanced qw(extract_bracketed);
my $string = "ab,12,20100401,xyz(A,B(a,d))";
my @params = ();
while ($string) {
    if ($string =~ /^([^(]*?),/) {
        push @params, $1;
        $string =~ s/^\Q$1\E\s*,?\s*//;
    } else {
        my ($ext, $pre);
        ($ext, $string, $pre) = extract_bracketed($string,'()','[^()]+');
        push @params, "$pre$ext";
        $string =~ s/^\s*,\s*//;
    }
}

This one supports:

  • nested parentheses;
  • empty fields;
  • strings of any length.


Here is one way that should work.

use Regexp::Common;

my $string = 'ab,12,20100401,xyz(A,B)';
my @array = ($string =~ /(?:$RE{balanced}{-parens=>'()'}|[^,])+/g);

Regexp::Common can be installed from CPAN.

There is a bug in this code, coming from the depths of Regexp::Common. Be warned that this will (unfortunately) fail to match the lack of space between ,,.


Well, old question, but I just happened to wrestle with this all night, and the question was never marked answered, so in case anyone arrives here by Google as I did, here's what I finally got. It's a very short answer using only built-in PERL regex features:

my $string='ab,12,20100401,xyz(A,B)';
$string =~ s/((\((?>[^)(]*(?2)?)*\))|[^,()]*)(*SKIP),/$1\n/g;
my @array=split('\n',$string);

Commas that are not inside parentheses are changed to newlines and then the array is split on them. This will ignore commas inside any level of nested parentheses, as long as they're properly balanced with a matching number of open and close parens.

This assumes you won't have newline \n characters in the initial value of $string. If you need to, either temporarily replace them with something else before the substitution line and then use a loop to replace back after the split, or just pick a different delimiter to split the array on.


Limit the number of elements it can be split into:

split(',', $string, 4)


Here's another way:

my $string='ab,12,20100401,xyz(A,B)';
my @array = ($string =~ /(
    [^,]*\([^)]*\)   # comma inside parens is part of the word
    |
    [^,]*)           # split on comma outside parens
    (?:,|$)/gx);

Produces:

ab
12
20100401
xyz(A,B)


Here is my attempt. It should handle depth well and could even be extended to include other bracketed symbols easily (though harder to be sure that they MATCH). This method will not in general work for quotation marks rather than brackets.

#!/usr/bin/perl

use strict;
use warnings;

my $string='ab,12,20100401,xyz(A(2,3),B)';

print "$_\n" for parse($string);

sub parse {
  my ($string) = @_;
  my @fields;

  my @comma_separated = split(/,/, $string);

  my @to_be_joined;
  my $depth = 0;
  foreach my $field (@comma_separated) {
    my @brackets = $field =~ /(\(|\))/g;
    foreach (@brackets) {
      $depth++ if /\(/;
      $depth-- if /\)/;
    }

    if ($depth == 0) {
      push @fields, join(",", @to_be_joined, $field);
      @to_be_joined = ();
    } else {
      push @to_be_joined, $field;
    }
  }

  return @fields;
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜