开发者

Perl regex match multiple instances of a pattern and replace

I have a string that looks like this:

abc[1,2,3].something.here,foo[10,6,34].somethingelse.here,def[1,2].another

I want to split this string into an array that consists of:

abc[1,2,3].something.here
foo[10,6,34].somethingelse.here
def[1,2].another

But splitting on the comment won't work so my next idea is to first replace the commas that reside between the square brackets with something else so I can split on the comma, then replace after the fact.

I've开发者_开发问答 tried a few approaches with little success.. Any suggestions?


You can use look-ahead assertion in the pattern:

my $s = "abc[1,2,3].something.here,foo[10,6,34].somethingelse.here,def[1,2].another";
my @a = split /,(?=\w+\[)/, $s;


When things get that complex, I like the parser approach.

#!/usr/bin/perl
use strict;
use warnings;

my $statement  =  "abc[1,2,3].something.here,foo[10,6,34].somethingelse.here,def[1,2].another";

my $index      = qr/\[(?:\d+)(?:,\d+)*\]/;
my $variable   = qr/\w+$index?/;
my $expression = qr/$variable(?:\.$variable)*/;

my @expressions = ($statement =~ /($expression)/g);

print "$_\n" for @expressions;


Iterate through the characters in the string like this (pseudocode):

found_closing_bracket = 0;
buffer = ''
array = []

foreach c in str:

   if c == ']'
      found_closing_bracket = 1

   if c == ',' && found_closing_bracket == 1
     push(array, buffer)
     buffer = ''
     found_closing_bracket = 0

   else
     buffer = buffer + c

Sure, you could use regular expressions, but personally I rather aim for a simpler solution even if it's more hackish. Regular expressions are a pain to read sometimes.


An alternative to eugene y's answer:

my $s = "abc[1,2,3].something.here,foo[10,6,34].somethingelse.here,def[1,2].another";
my @a = ($s =~ /[^,]+\[[\d,]*\]/g);
print join("\n", @a,"")


This question gave me excuse to take a look at Regexp::Grammars I wanted for some time. Following snippet works for your input:

use Regexp::Grammars;
use Data::Dump qw(dd);

my $input
    = 'abc[1,2,3].something.here,foo[10,6,34].somethingelse.here,def[1,2].another';

my $re = qr{
    <[tokens]> ** (,)  # comma separated tokens

    <rule: tokens>     <.token>*
    <rule: token>      \w+ | [.] | <bracketed>
    <rule: bracketed>  \[ <.token> ** (,) \]
}x;

dd $/{tokens}
    if $input =~ $re;

# prints
# [
#   "abc[1,2,3].something.here",
#   "foo[10,6,34].somethingelse.here",
#   "def[1,2].another",
# ]
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜