开发者

How to exclude submatches in Perl?

I have to split a string into pieces containing words or special characters.

Let´s say I have the string 'This is "another problem..."'. What I want to get is an array consisting of these pieces: ('This', 'is', '"', 'another', 'problem', '...', '"').

I have done this in JavaScript with the following RegExp which works fine:

string.match(/([^-\s\w])\1*|[-\w]+/g); // works

Using the same approach in Perl does not work because of the subpattern I use to combine consecutive characters and I get these matches as well:

@matches = $string =~ m/(([^开发者_如何学运维-\s\w])\2*|[-\w]+)/g; # does not work

Is there a way of getting rid of the subpatterns/submatches either in the result or in the regexp itself?


In your "does not work" example, I think you mean \2, not \1.

You'd have to iterate through the matches to do this:

push @matches, "$1" while $string =~ m/(([^-\s\w])\2*|[-\w]+)/g;


my @matches;
push @matches, ${^MATCH} while $string =~ /([^-\s\w])\1*|[-\w]+/pg;

my @matches;
push @matches, $1 while $string =~ /(([^-\s\w])\2*|[-\w]+)/g;

my $i = 1;
my @matches = grep ++$i % 2, $string =~ /(([^-\s\w])\2*|[-\w]+)/g;


In Perl, there's more than one way to do it (TMTOWTDI):

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my $str='Here\'s a (good, bad, ..., ?) example to be used in this "reg-ex" test.';

# NB: grepping on $_ will remove empty results

my @matches = grep { $_ } split(/
  \s*             # discard possible leading whitespace
  (
    \.{3}         # ellipsis (must come before punct)
  |
    \w+\-\w+      # hyphenated words
  |
    \w+\'(?:\w+)? # compound words
  | 
    \w+           # other words
  | 
    [[:punct:]]   # other punctuation chars
  )
/x,$str);

print Dumper(\@matches);

will print:

$VAR1 = [
      'Here\'s',
      'a',
      '(',
      'good',
      ',',
      'bad',
      ',',
      '...',
      ',',
      '?',
      ')',
      'example',
      'to',
      'be',
      'used',
      'in',
      'this',
      '"',
      'reg-ex',
      '"',
      'test',
      '.'
    ];
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜