How can I match strings that don't match a particular pattern in Perl?

2022-12-18 04:30 问答作者：

I know that it is easy to match anything except a given character using a regular expression.

$text = "ab ac ad";
$text =~ s/[^c]*//g; # Match anything, except c.

$text is now "c".

I don't know how to "except" strings instead of characters. How would I "match anything, except 'ac'" ? Tried [^(ac)] 开发者_Go百科and [^"ac"] without success.

Is it possible at all?

The following solves the question as understood in the second sense described in Bart K. comment:

>> $text='ab ac ad';
>> $text =~ s/(ac)|./\1/g;
>> print $text;
ac

Also, 'abacadac' -> 'acac'

It should be noted though that in most practical applications negative lookaheads prove to be more useful than this approach.

If you just want to check if the string does not contain "ac", just use a negation.

$text = "ab ac ad";

print "ac not found" if $text !~ /ac/;

print "ac not found" unless $text =~ /ac/;

$text =~ s/[^c]*//g; // Match anything, except c.

@ssn, A couple of comments about your question:

"//" is not a comment in Perl. Only "#" is.
"[^c]*" - there is no need for the "*" there. "[^c]" means the character class composed of all characters except the letter "c". Then you use the /g modifier, meaning all such occurrences in the text will be replaced (in your example, with nothing). The "zero or more" ("*") modifier is therefore redundant.

How would I "match anything, except 'ac'" ? Tried [^(ac)] and [^"ac"] without success.

Please read the documentation on character classes(See "perldoc perlre" on your command line, or online at http://perldoc.perl.org/perlre.html ) - you'll see it states that for the list of characters within the square brackets the RE will "match any character from the list". Meaning order is not relevant and there are no "strings", only a list of characters. "()" and double quotes also have no special meaning inside the square brackets.

Now I'm not exactly sure why you're talking about matching but then giving an example of substitution. But to see if a string does not match the sub-string "ac" you just need to negate the match:

use strict; use warnings;
my $text = "ab ac ad";
if ($text !~ m/ac/) {
   print "Yey the text doesn't match 'ac'!\n"; # this shouldn't be printed
}

Say you have a string of text within which are embedded multiple occurrences of a substring. If you just want the text surrounding the sub-string, just remove all occurrences of the sub-string:

$text =~ s/ac//g;

If you want the reverse - to remove all text except for all occurrences of the sub-string, I would suggest something like:

use strict; use warnings;
my $text = "ab ac ad ac ae";
my $sub_str = "ac";
my @captured = $text =~ m/($sub_str)/g;
my $num = scalar @captured;
print (($sub_str x $num) . "\n");

This basically counts the number of times the sub-string appears in the text and prints the sub-string that number of times using the "x" operator. Not very elegant, I'm sure a Perl-guru could come up with something better.

@ennuikiller:

my $text = "ab ac ad";
$text !~ s/(ac)//g; # Match anything, except ac.

This is incorrect, since it generates a warning ("Useless use of negative pattern binding (!~) in void context") under "use warnings" and doesn't do anything except remove all substrings "ac" from the text, which could be more simply written as I wrote above with:

$text =~ s/ac//g;

Update: In a comment on your question, you mentioned you want to clean wiki markup and remove balanced sequences of {{ ... }}. Section 6 of the Perl FAQ covers this: Can I use Perl regular expressions to match balanced text?

Consider the following program:

#! /usr/bin/perl

use warnings;
use strict;

use Text::Balanced qw/ extract_tagged /;

# for demo only
*ARGV = *DATA;

while (<>) {
  if (s/^(.+?)(?=\{\{)//) {
    print $1;
    my(undef,$after) = extract_tagged $_, "{{" => "}}";

    if (defined $after) {
      $_ = $after;
      redo;
    }
  }

  print;
}

__DATA__
Lorem ipsum dolor sit amet, consectetur
adipiscing elit. {{delete me}} Sed quis
nulla ut dolor {{me too}} fringilla
mollis {{ quis {{ ac }} erat.

Its output:

Lorem ipsum dolor sit amet, consectetur
adipiscing elit.  Sed quis
nulla ut dolor  fringilla
mollis {{ quis  erat.

For your particular example, you could use

$text =~ s/[^ac]|a(?!c)|(?<!a)c//g;

That is, only delete an a or c when they aren't part of an ac sequence.

In general, this is tricky to do with a regular expression.

Say you don't want foo followed by optional whitespace and then bar in $str. Often, it's clearer and easier to check separately. For example:

die "invalid string ($str)"
  if $str =~ /^.*foo\s*bar/;

You might also be interested in an answer to a similar question, where I wrote

my $nofoo = qr/
  (      [^f] |
    f  (?! o) |
    fo (?! o  \s* bar)
  )*
/x;

my $pattern = qr/^ $nofoo bar /x;

To understand the complication, read How Regexes Work by Mark Dominus. The engine compiles regular expressions into state machines. When it's time to match, it feeds the input string to the state machine and checks whether the state machine finishes in an accept state. So to exclude a string, you have to specify a machine that accepts all inputs except a particular sequence.

What might help is a /v regular expression switch that creates the state machine as usual but then complements the accept-state bit for all states. It's hard to say whether this would really be useful as compared with separate checks because a /v regular expression may still surprise people, just in different ways.

If you're interested in the theoretical details, see An Introduction to Formal Languages and Automata by Peter Linz.

you can use index()

$text = "ab ac ad";
print "ac not found" if ( index($text,"ac") == -1 );

You can easily modify this regex for your purpose.

use Test::More 0.88;

#Match any whole text that does not contain a string
my $re=qr/^(?:(?!ac).)*$/;
my $str='ab ac ad';

ok(!$str=~$re);

$str='ab af ad';
ok($str=~$re);

done_testing();

继续阅读：perl regex

How can I match strings that don't match a particular pattern in Perl?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？