How does this regular expression work?
From this article,
/^1?$|^(11+?)\1+$/
checks whether a number(its value in unary) is prime or not.
Using this, perl -l -e '(1 x $_) !~ /^1?$|^(11+?)\1+$/ && print while ++$_;'
returns a list of prime numbers.
I do not have enough experience with Perl, but what I understand is that the regular expression will be true for a number that is not prime. So, if we print all numbers that do not produce a true with this expression, we have a list of prime numbers. Thats what the perl query is trying to do.
About the regex part,
^1?$
part is for counting 1 as not prime
^(11+?)\1+$
is for matching not prime numbers starting from 4.
What I do not understand is why is the ?
in the regex needed at all.
According to me /^1$|^(11+)\1+$/
should be just fine and actually
perl -l -e '(1 x $_) !~ /^1$|^(11+)\1+$/ && print while ++$_;'
gi开发者_如何学运维ves me the same set of prime numbers.
Is there any flaw in my understanding of the regular expression? Why are the ?
s needed?
Isn't ?
supposed to match zero or one occurrence of the expression preceding it?
The first ?
is for matching the empty string (i.e. 0) as non-prime. If you don't care whether the regexp matches 0, then it's not necessary.
The second ?
is only for efficiency. +
is normally "greedy", which means it matches as many characters as are available, and then backtracks if the rest of the regexp fails to match. The +?
makes it non-greedy, so it matches only 1 character, and then tries matching more if the rest of the regexp fails to match. (See the Quantifiers section of perlre for more about greedy vs. non-greedy matching.)
In this particular regexp, the (11+?)
means it tests divisibility by 2 ('11'
), then 3 ('111'
), then 4, etc. If you used (11+)
, it would test divisibility by N (the number itself), then N-1, then N-2, etc. Since a divisor must be no greater than N/2, without the ?
it would waste time testing a lot of "potential" divisors that can't possibly work. It would still match non-prime numbers, just more slowly. (Also, $1
would be the largest divisor instead of the smallest one.)
The first ?
will make "" (the empty string, unary zero) not be a prime number. Zero is defined as not prime.
The second is different; it stops the regular expression from greedy-matching. It should improve the performance of the match greatly, since the first part of that section ((11+)
) won't consume almost the entire string before having to backtrack. If you omit the question mark, you're effectively testing whether odd n
is divisible by n-1
and so one down; if you include it, you're testing divisibility by two first and so on upwards. Obviously, numbers tend to be divisible by smaller factors more often, so your match will be faster.
精彩评论