开发者

Negative regex for Perl string pattern match

I have this regex:

if($string =~ m/^(Clinton|[^Bush]|Reagan)/i)
  {print "$string\n"};

I want t开发者_如何学运维o match with Clinton and Reagan, but not Bush.

It's not working.


Your regex does not work because [] defines a character class, but what you want is a lookahead:

(?=) - Positive look ahead assertion foo(?=bar) matches foo when followed by bar
(?!) - Negative look ahead assertion foo(?!bar) matches foo when not followed by bar
(?<=) - Positive look behind assertion (?<=foo)bar matches bar when preceded by foo
(?<!) - Negative look behind assertion (?<!foo)bar matches bar when NOT preceded by foo
(?>) - Once-only subpatterns (?>\d+)bar Performance enhancing when bar not present
(?(x)) - Conditional subpatterns
(?(3)foo|fu)bar - Matches foo if 3rd subpattern has matched, fu if not
(?#) - Comment (?# Pattern does x y or z)

So try: (?!bush)


Sample text:

Clinton said
Bush used crayons
Reagan forgot

Just omitting a Bush match:

$ perl -ne 'print if /^(Clinton|Reagan)/' textfile
Clinton said
Reagan forgot

Or if you really want to specify:

$ perl -ne 'print if /^(?!Bush)(Clinton|Reagan)/' textfile
Clinton said
Reagan forgot


Your regex says the following:

/^         - if the line starts with
(          - start a capture group
Clinton|   - "Clinton" 
|          - or
[^Bush]    - Any single character except "B", "u", "s" or "h"
|          - or
Reagan)   - "Reagan". End capture group.
/i         - Make matches case-insensitive 

So, in other words, your middle part of the regex is screwing you up. As it is a "catch-all" kind of group, it will allow any line that does not begin with any of the upper or lower case letters in "Bush". For example, these lines would match your regex:

Our president, George Bush
In the news today, pigs can fly
012-3123 33

You either make a negative look-ahead, as suggested earlier, or you simply make two regexes:

if( ($string =~ m/^(Clinton|Reagan)/i) and
    ($string !~ m/^Bush/i) ) {
   print "$string\n";
}

As mirod has pointed out in the comments, the second check is quite unnecessary when using the caret (^) to match only beginning of lines, as lines that begin with "Clinton" or "Reagan" could never begin with "Bush".

However, it would be valid without the carets.


What's wrong with using two regexs (or three)? This makes your intentions more clear and may even improve your performance:

if ($string =~ /^(Clinton|Reagan)/i && $string !~ /Bush/i) { ... }

if (($string =~ /^Clinton/i || $string =~ /^Reagan/i)
        && $string !~ /Bush/i) {
    print "$string\n"
}


If my understanding is correct then you want to match any line which has Clinton and Reagan, in any order, but not Bush. As suggested by Stuck, here is a version with lookahead assertions:

#!/usr/bin/perl

use strict;
use warnings;

my $regex = qr/
    (?=.*clinton)  
    (?!.*bush) 
    .*reagan       
    /ix;

while (<DATA>) {
    chomp;
    next unless (/$regex/);
    print $_, "\n";
}


__DATA__
shouldn't match - reagan came first, then clinton, finally bush
first match - first two: reagan and clinton
second match - first two reverse: clinton and reagan
shouldn't match - last two: clinton and bush
shouldn't match - reverse: bush and clinton
shouldn't match - and then came obama, along comes mary
shouldn't match - to clinton with perl

Results

first match - first two: reagan and clinton
second match - first two reverse: clinton and reagan

as desired it matches any line which has Reagan and Clinton in any order.

You may want to try reading how lookahead assertions work with examples at http://www252.pair.com/comdog/mastering_perl/Chapters/02.advanced_regular_expressions.html

they are very tasty :)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜