开发者

Inverse of Perl regex X modifier

I would like to use a Perl regular expression to match strings like this:

spaM
s p a m
sp Am
S   p a   m

Looking at Perl's x modifier, I should be able to do this:

<?php
echo preg_match('#spam#ix', 's p a   m');
?>

But this prints out 0 (false). The x modifier actually ignores whitespace on the regex, not the string being an开发者_StackOverflow中文版alyzed. How would I do it the other way around? That is, ignore whitespace on the string being analyzed rather than my regex? I'm aware there are multi-step ways to do this, such as first stripping all white space from the string, but I wanted to know if there was a powerful one-step regex solution.


Truthfully, I think you are better off stripping the whitespace then matching. Since this is what you mean to do, your code will be clearer than finding a magic regex, or injecting whitspace patterns between letters.

The Perl for this would then look something like.

my $string = "S   p A m";
(my $string_no_ws = $string) =~ s/\s//g;
if ($string_no_ws =~ /spam/i) {
  #do something
}

actually you can do the test without a regex if you want to, using index:

my $string = "S   p A m";
(my $lc_string_no_ws = lc $string) =~ s/\s//g;
if (index($lc_string_no_ws, 'spam') >= 0) {
  #do something
}


The #x modifier works the other way around. It allows to use extraneous whitespace in the regex, which is ignored for searching:

preg_match('# s p a m #ix')

Will only ever match "spam".

What you need to do in order to find arbitrary whitespace with your regex is to inject \s* between any letters:

preg_match('# S \s* P \s* A \s* M #ix', 's p a   m');

You can automate/simplify that a bit, by converting words into an appropriate regex with intermixed \s* using:

$regex = join('\s*', str_split("spam", 1));
preg_match("#$regex#ix", "s p a m");


the /x modifying for regex in perl refers to the regex construct and not to what is being matched. To match the values you have you want

/s\s*p\s*a\s*m\s*/i

if order matters for the word spam, and if it doesn't then something like

/[spam \t\n\r]+/ 

would suffice

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜