How to exclude submatches in Perl?
I have to split a string into pieces containing words or special characters.
Let´s say I have the string 'This is "another problem..."'. What I want to get is an array consisting of these pieces: ('This', 'is', '"', 'another', 'problem', '...', '"').
I have done this in JavaScript with the following RegExp which works fine:
string.match(/([^-\s\w])\1*|[-\w]+/g); // works
Using the same approach in Perl does not work because of the subpattern I use to combine consecutive characters and I get these matches as well:
@matches = $string =~ m/(([^开发者_如何学运维-\s\w])\2*|[-\w]+)/g; # does not work
Is there a way of getting rid of the subpatterns/submatches either in the result or in the regexp itself?
In your "does not work" example, I think you mean \2, not \1.
You'd have to iterate through the matches to do this:
push @matches, "$1" while $string =~ m/(([^-\s\w])\2*|[-\w]+)/g;
my @matches;
push @matches, ${^MATCH} while $string =~ /([^-\s\w])\1*|[-\w]+/pg;
my @matches;
push @matches, $1 while $string =~ /(([^-\s\w])\2*|[-\w]+)/g;
my $i = 1;
my @matches = grep ++$i % 2, $string =~ /(([^-\s\w])\2*|[-\w]+)/g;
In Perl, there's more than one way to do it (TMTOWTDI):
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $str='Here\'s a (good, bad, ..., ?) example to be used in this "reg-ex" test.';
# NB: grepping on $_ will remove empty results
my @matches = grep { $_ } split(/
\s* # discard possible leading whitespace
(
\.{3} # ellipsis (must come before punct)
|
\w+\-\w+ # hyphenated words
|
\w+\'(?:\w+)? # compound words
|
\w+ # other words
|
[[:punct:]] # other punctuation chars
)
/x,$str);
print Dumper(\@matches);
will print:
$VAR1 = [
'Here\'s',
'a',
'(',
'good',
',',
'bad',
',',
'...',
',',
'?',
')',
'example',
'to',
'be',
'used',
'in',
'this',
'"',
'reg-ex',
'"',
'test',
'.'
];
精彩评论