Perl regex match multiple instances of a pattern and replace
I have a string that looks like this:
abc[1,2,3].something.here,foo[10,6,34].somethingelse.here,def[1,2].another
I want to split this string into an array that consists of:
abc[1,2,3].something.here
foo[10,6,34].somethingelse.here
def[1,2].another
But splitting on the comment won't work so my next idea is to first replace the commas that reside between the square brackets with something else so I can split on the comma, then replace after the fact.
I've开发者_开发问答 tried a few approaches with little success.. Any suggestions?
You can use look-ahead assertion in the pattern:
my $s = "abc[1,2,3].something.here,foo[10,6,34].somethingelse.here,def[1,2].another";
my @a = split /,(?=\w+\[)/, $s;
When things get that complex, I like the parser approach.
#!/usr/bin/perl
use strict;
use warnings;
my $statement = "abc[1,2,3].something.here,foo[10,6,34].somethingelse.here,def[1,2].another";
my $index = qr/\[(?:\d+)(?:,\d+)*\]/;
my $variable = qr/\w+$index?/;
my $expression = qr/$variable(?:\.$variable)*/;
my @expressions = ($statement =~ /($expression)/g);
print "$_\n" for @expressions;
Iterate through the characters in the string like this (pseudocode):
found_closing_bracket = 0;
buffer = ''
array = []
foreach c in str:
if c == ']'
found_closing_bracket = 1
if c == ',' && found_closing_bracket == 1
push(array, buffer)
buffer = ''
found_closing_bracket = 0
else
buffer = buffer + c
Sure, you could use regular expressions, but personally I rather aim for a simpler solution even if it's more hackish. Regular expressions are a pain to read sometimes.
An alternative to eugene y's answer:
my $s = "abc[1,2,3].something.here,foo[10,6,34].somethingelse.here,def[1,2].another";
my @a = ($s =~ /[^,]+\[[\d,]*\]/g);
print join("\n", @a,"")
This question gave me excuse to take a look at Regexp::Grammars I wanted for some time. Following snippet works for your input:
use Regexp::Grammars;
use Data::Dump qw(dd);
my $input
= 'abc[1,2,3].something.here,foo[10,6,34].somethingelse.here,def[1,2].another';
my $re = qr{
<[tokens]> ** (,) # comma separated tokens
<rule: tokens> <.token>*
<rule: token> \w+ | [.] | <bracketed>
<rule: bracketed> \[ <.token> ** (,) \]
}x;
dd $/{tokens}
if $input =~ $re;
# prints
# [
# "abc[1,2,3].something.here",
# "foo[10,6,34].somethingelse.here",
# "def[1,2].another",
# ]
精彩评论