How does this regular expression work?

2023-01-08 14:48 问答作者：

From this article,

/^1?$|^(11+?)\1+$/ checks whether a number(its value in unary) is prime or not.

Using this, perl -l -e '(1 x $_) !~ /^1?$|^(11+?)\1+$/ && print while ++$_;' returns a list of prime numbers.

I do not have enough experience with Perl, but what I understand is that the regular expression will be true for a number that is not prime. So, if we print all numbers that do not produce a true with this expression, we have a list of prime numbers. Thats what the perl query is trying to do.

About the regex part,

^1?$ part is for counting 1 as not prime

^(11+?)\1+$ is for matching not prime numbers starting from 4.

What I do not understand is why is the ? in the regex needed at all. According to me /^1$|^(11+)\1+$/ should be just fine and actually

perl -l -e '(1 x $_) !~ /^1$|^(11+)\1+$/ && print while ++$_;' gi开发者_如何学运维ves me the same set of prime numbers.

Is there any flaw in my understanding of the regular expression? Why are the ?s needed?

Isn't ? supposed to match zero or one occurrence of the expression preceding it?

The first ? is for matching the empty string (i.e. 0) as non-prime. If you don't care whether the regexp matches 0, then it's not necessary.

The second ? is only for efficiency. + is normally "greedy", which means it matches as many characters as are available, and then backtracks if the rest of the regexp fails to match. The +? makes it non-greedy, so it matches only 1 character, and then tries matching more if the rest of the regexp fails to match. (See the Quantifiers section of perlre for more about greedy vs. non-greedy matching.)

In this particular regexp, the (11+?) means it tests divisibility by 2 ('11'), then 3 ('111'), then 4, etc. If you used (11+), it would test divisibility by N (the number itself), then N-1, then N-2, etc. Since a divisor must be no greater than N/2, without the ? it would waste time testing a lot of "potential" divisors that can't possibly work. It would still match non-prime numbers, just more slowly. (Also, $1 would be the largest divisor instead of the smallest one.)

The first ? will make "" (the empty string, unary zero) not be a prime number. Zero is defined as not prime.

The second is different; it stops the regular expression from greedy-matching. It should improve the performance of the match greatly, since the first part of that section ((11+)) won't consume almost the entire string before having to backtrack. If you omit the question mark, you're effectively testing whether odd n is divisible by n-1 and so one down; if you include it, you're testing divisibility by two first and so on upwards. Obviously, numbers tend to be divisible by smaller factors more often, so your match will be faster.

继续阅读：perl primes regex

How does this regular expression work?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？