Fast prime factorization module
I am looking for an implementation or clear algorithm for getting the prime factors of N in either python, pseudocode or anything else well-readable. There are a few requirements/constraints:
- N is between 1 and ~20 digits
- No pre-calculated lookup table, memoization is fine though
- Need not to be mathematically proven (e.g. could rely on the Goldbach conjecture if needed)
- Need not to be precise, is allowed to be probabilistic/deterministic if needed
I need a fast prime factorization algorithm, not only for itself, but for usage in many other algorithms like calculating the Euler phi(n).
I have tried other algorithms from Wikipedia and such but either I couldn't understand them (ECM) or I couldn't c开发者_C百科reate a working implementation from the algorithm (Pollard-Brent).
I am really interested in the Pollard-Brent algorithm, so any more information/implementations on it would be really nice.
Thanks!
EDIT
After messing around a little I have created a pretty fast prime/factorization module. It combines an optimized trial division algorithm, the Pollard-Brent algorithm, a miller-rabin primality test and the fastest primesieve I found on the internet. gcd is a regular Euclid's GCD implementation (binary Euclid's GCD is much slower then the regular one).
Bounty
Oh joy, a bounty can be acquired! But how can I win it?
- Find an optimization or bug in my module.
- Provide alternative/better algorithms/implementations.
The answer which is the most complete/constructive gets the bounty.
And finally the module itself:
import random
def primesbelow(N):
# http://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
#""" Input N>=6, Returns a list of primes, 2 <= p < N """
correction = N % 6 > 1
N = {0:N, 1:N-1, 2:N+4, 3:N+3, 4:N+2, 5:N+1}[N%6]
sieve = [True] * (N // 3)
sieve[0] = False
for i in range(int(N ** .5) // 3 + 1):
if sieve[i]:
k = (3 * i + 1) | 1
sieve[k*k // 3::2*k] = [False] * ((N//6 - (k*k)//6 - 1)//k + 1)
sieve[(k*k + 4*k - 2*k*(i%2)) // 3::2*k] = [False] * ((N // 6 - (k*k + 4*k - 2*k*(i%2))//6 - 1) // k + 1)
return [2, 3] + [(3 * i + 1) | 1 for i in range(1, N//3 - correction) if sieve[i]]
smallprimeset = set(primesbelow(100000))
_smallprimeset = 100000
def isprime(n, precision=7):
# http://en.wikipedia.org/wiki/Miller-Rabin_primality_test#Algorithm_and_running_time
if n < 1:
raise ValueError("Out of bounds, first argument must be > 0")
elif n <= 3:
return n >= 2
elif n % 2 == 0:
return False
elif n < _smallprimeset:
return n in smallprimeset
d = n - 1
s = 0
while d % 2 == 0:
d //= 2
s += 1
for repeat in range(precision):
a = random.randrange(2, n - 2)
x = pow(a, d, n)
if x == 1 or x == n - 1: continue
for r in range(s - 1):
x = pow(x, 2, n)
if x == 1: return False
if x == n - 1: break
else: return False
return True
# https://comeoncodeon.wordpress.com/2010/09/18/pollard-rho-brent-integer-factorization/
def pollard_brent(n):
if n % 2 == 0: return 2
if n % 3 == 0: return 3
y, c, m = random.randint(1, n-1), random.randint(1, n-1), random.randint(1, n-1)
g, r, q = 1, 1, 1
while g == 1:
x = y
for i in range(r):
y = (pow(y, 2, n) + c) % n
k = 0
while k < r and g==1:
ys = y
for i in range(min(m, r-k)):
y = (pow(y, 2, n) + c) % n
q = q * abs(x-y) % n
g = gcd(q, n)
k += m
r *= 2
if g == n:
while True:
ys = (pow(ys, 2, n) + c) % n
g = gcd(abs(x - ys), n)
if g > 1:
break
return g
smallprimes = primesbelow(1000) # might seem low, but 1000*1000 = 1000000, so this will fully factor every composite < 1000000
def primefactors(n, sort=False):
factors = []
for checker in smallprimes:
while n % checker == 0:
factors.append(checker)
n //= checker
if checker > n: break
if n < 2: return factors
while n > 1:
if isprime(n):
factors.append(n)
break
factor = pollard_brent(n) # trial division did not fully factor, switch to pollard-brent
factors.extend(primefactors(factor)) # recurse to factor the not necessarily prime factor returned by pollard-brent
n //= factor
if sort: factors.sort()
return factors
def factorization(n):
factors = {}
for p1 in primefactors(n):
try:
factors[p1] += 1
except KeyError:
factors[p1] = 1
return factors
totients = {}
def totient(n):
if n == 0: return 1
try: return totients[n]
except KeyError: pass
tot = 1
for p, exp in factorization(n).items():
tot *= (p - 1) * p ** (exp - 1)
totients[n] = tot
return tot
def gcd(a, b):
if a == b: return a
while b > 0: a, b = b, a % b
return a
def lcm(a, b):
return abs((a // gcd(a, b)) * b)
If you don't want to reinvent the wheel, use the library sympy
pip install sympy
Use the function sympy.ntheory.factorint
Given a positive integer
n
,factorint(n)
returns a dict containing the prime factors ofn
as keys and their respective multiplicities as values. For example:
Example:
>>> from sympy.ntheory import factorint
>>> factorint(10**20+1)
{73: 1, 5964848081: 1, 1676321: 1, 137: 1}
You can factor some very large numbers:
>>> factorint(10**100+1)
{401: 1, 5964848081: 1, 1676321: 1, 1601: 1, 1201: 1, 137: 1, 73: 1, 129694419029057750551385771184564274499075700947656757821537291527196801: 1}
There is no need to calculate smallprimes
using primesbelow
, use smallprimeset
for that.
smallprimes = (2,) + tuple(n for n in xrange(3,1000,2) if n in smallprimeset)
Divide your primefactors
into two functions for handling smallprimes
and other for pollard_brent
, this can save a couple of iterations as all the powers of smallprimes will be divided from n.
def primefactors(n, sort=False):
factors = []
limit = int(n ** .5) + 1
for checker in smallprimes:
print smallprimes[-1]
if checker > limit: break
while n % checker == 0:
factors.append(checker)
n //= checker
if n < 2: return factors
else :
factors.extend(bigfactors(n,sort))
return factors
def bigfactors(n, sort = False):
factors = []
while n > 1:
if isprime(n):
factors.append(n)
break
factor = pollard_brent(n)
factors.extend(bigfactors(factor,sort)) # recurse to factor the not necessarily prime factor returned by pollard-brent
n //= factor
if sort: factors.sort()
return factors
By considering verified results of Pomerance, Selfridge and Wagstaff and Jaeschke, you can reduce the repetitions in isprime
which uses Miller-Rabin primality test. From Wiki.
- if n < 1,373,653, it is enough to test a = 2 and 3;
- if n < 9,080,191, it is enough to test a = 31 and 73;
- if n < 4,759,123,141, it is enough to test a = 2, 7, and 61;
- if n < 2,152,302,898,747, it is enough to test a = 2, 3, 5, 7, and 11;
- if n < 3,474,749,660,383, it is enough to test a = 2, 3, 5, 7, 11, and 13;
- if n < 341,550,071,728,321, it is enough to test a = 2, 3, 5, 7, 11, 13, and 17.
Edit 1: Corrected return call of if-else
to append bigfactors to factors in primefactors
.
Even on the current one, there are several spots to be noticed.
- Don't do
checker*checker
every loop, uses=ceil(sqrt(num))
andchecher < s
- checher should plus 2 each time, ignore all even numbers except 2
- Use
divmod
instead of%
and//
You should probably do some prime detection which you could look here, Fast algorithm for finding prime numbers?
You should read that entire blog though, there is a few algorithms that he lists for testing primality.
There's a python library with a collection of primality tests (including incorrect ones for what not to do). It's called pyprimes. Figured it's worth mentioning for posterity's purpose. I don't think it includes the algorithms you mentioned.
You could factorize up to a limit then use brent to get higher factors
from fractions import gcd
from random import randint
def brent(N):
if N%2==0: return 2
y,c,m = randint(1, N-1),randint(1, N-1),randint(1, N-1)
g,r,q = 1,1,1
while g==1:
x = y
for i in range(r):
y = ((y*y)%N+c)%N
k = 0
while (k<r and g==1):
ys = y
for i in range(min(m,r-k)):
y = ((y*y)%N+c)%N
q = q*(abs(x-y))%N
g = gcd(q,N)
k = k + m
r = r*2
if g==N:
while True:
ys = ((ys*ys)%N+c)%N
g = gcd(abs(x-ys),N)
if g>1: break
return g
def factorize(n1):
if n1==0: return []
if n1==1: return [1]
n=n1
b=[]
p=0
mx=1000000
while n % 2 ==0 : b.append(2);n//=2
while n % 3 ==0 : b.append(3);n//=3
i=5
inc=2
while i <=mx:
while n % i ==0 : b.append(i); n//=i
i+=inc
inc=6-inc
while n>mx:
p1=n
while p1!=p:
p=p1
p1=brent(p)
b.append(p1);n//=p1
if n!=1:b.append(n)
return sorted(b)
from functools import reduce
#n= 2**1427 * 31 #
n= 67898771 * 492574361 * 10000223 *305175781* 722222227*880949 *908909
li=factorize(n)
print (li)
print (n - reduce(lambda x,y :x*y ,li))
I just ran into a bug in this code when factoring the number 2**1427 * 31
.
File "buckets.py", line 48, in prettyprime
factors = primefactors.primefactors(n, sort=True)
File "/private/tmp/primefactors.py", line 83, in primefactors
limit = int(n ** .5) + 1
OverflowError: long int too large to convert to float
This code snippet:
limit = int(n ** .5) + 1
for checker in smallprimes:
if checker > limit: break
while n % checker == 0:
factors.append(checker)
n //= checker
limit = int(n ** .5) + 1
if checker > limit: break
should be rewritten into
for checker in smallprimes:
while n % checker == 0:
factors.append(checker)
n //= checker
if checker > n: break
which will likely perform faster on realistic inputs anyway. Square root is slow — basically the equivalent of many multiplications —, smallprimes
only has a few dozen members, and this way we remove the computation of n ** .5
from the tight inner loop, which is certainly helpful when factoring numbers like 2**1427
. There's simply no reason to compute sqrt(2**1427)
, sqrt(2**1426)
, sqrt(2**1425)
, etc. etc., when all we care about is "does the [square of the] checker exceed n
".
The rewritten code doesn't throw exceptions when presented with big numbers, and is roughly twice as fast according to timeit
(on sample inputs 2
and 2**718 * 31
).
Also notice that isprime(2)
returns the wrong result, but this is okay as long as we don't rely on it. IMHO you should rewrite the intro of that function as
if n <= 3:
return n >= 2
...
精彩评论