How does this isPrime method in java using regex work [duplicate]
Possible Duplicate:
How to determine if a number is a prime with regex?
Here is the code:
public static boolean isPrime(int n) {
return !new String(new char[n]).matches(".?|(..+?)\\1+");
}开发者_JS百科
Can someone walk me through what the regex translates to in English and explain why it works?
Recall what a prime number is and what it isn't. First of all, this negates the regex match, so the regex matches on composite numbers.
It first creates a string of length n, its characters are irrelevant as long as they are matched by .
.
That means the number is either 0
or 1
(which are not prime):
.?
or it must have two divisors greater than one:
(..+?)\1+
The first divisor is handled by the capturing group (..+?)
which will match at least two characters (i.e. represent a number greater than or equal to two). The +?
is a lazy quantifier so it will try to match as little as possible; this likely just speeds up the process.
The \1
is a backreference matching the exact same thing the first group matched, this is repeated at least once using +
. This repetition represents the second factor of the number to check. So if the group matches a characters and the \1+
repeats this b − 1 times you got yourself a representation of a ċ b. If this matches, n was a composite number and thus not prime.
Regexes do backtracking in trying to create a match, so if it doesn't work with \1
containing two characters it will try with three, four, etc. until either a match is found or the group captures more than half the string.
So, e.g. for 14 the group would match two characters and \1
would then be repeated seven times, mimicking the factors of the number to test. Since this matches it has factors other than itself and one and thus isn't prime.
5 on the other hand would try with two characters in the group, then three and giving up there (since aaa
cannot exist more than once in aaaaa
). Thus five is prime.
Here is a more thorough explanation, although once you figure it out mathematically it's blindingly obvious and trivial.
.?
is one or none character
(..+?)
is at least 2 characters but with a reluctant qualifier (meaning it tries to match as little characters as possible) and the parentheses means it's a capturing group
\\1+
is repeating the first capturing group at least once
this means that the regex will only match strings with 0, 1 or any non trivial multiple (2 or more) of 2 or more
in other words it matches if n can be written as k*l
with both k
and l
greater or equal than 2 (one of the properties of a non-prime number)
however it's easier to just do
for(int i=2;i*i<=n;i++)if(n%i ==0)return false;
return true;
it does the exact same thing but stops sooner (on sqrt(n)
instead of n
)
精彩评论