Statistics: combinations in Python
I need to compute combinatorials (nCr) in Python but cannot find the funct开发者_高级运维ion to do that in math
, numpy
or stat
libraries. Something like a function of the type:
comb = calculate_combinations(n, r)
I need the number of possible combinations, not the actual combinations, so itertools.combinations
does not interest me.
Finally, I want to avoid using factorials, as the numbers I'll be calculating the combinations for can get too big and the factorials are going to be monstrous.
This seems like a REALLY easy to answer question, however I am being drowned in questions about generating all the actual combinations, which is not what I want.
See scipy.special.comb (scipy.misc.comb in older versions of scipy). When exact
is False, it uses the gammaln function to obtain good precision without taking much time. In the exact case it returns an arbitrary-precision integer, which might take a long time to compute.
Why not write it yourself? It's a one-liner or such:
from operator import mul # or mul=lambda x,y:x*y
from fractions import Fraction
def nCk(n,k):
return int( reduce(mul, (Fraction(n-i, i+1) for i in range(k)), 1) )
Test - printing Pascal's triangle:
>>> for n in range(17):
... print ' '.join('%5d'%nCk(n,k) for k in range(n+1)).center(100)
...
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
1 8 28 56 70 56 28 8 1
1 9 36 84 126 126 84 36 9 1
1 10 45 120 210 252 210 120 45 10 1
1 11 55 165 330 462 462 330 165 55 11 1
1 12 66 220 495 792 924 792 495 220 66 12 1
1 13 78 286 715 1287 1716 1716 1287 715 286 78 13 1
1 14 91 364 1001 2002 3003 3432 3003 2002 1001 364 91 14 1
1 15 105 455 1365 3003 5005 6435 6435 5005 3003 1365 455 105 15 1
1 16 120 560 1820 4368 8008 11440 12870 11440 8008 4368 1820 560 120 16 1
>>>
PS. edited to replace int(round(reduce(mul, (float(n-i)/(i+1) for i in range(k)), 1)))
with int(reduce(mul, (Fraction(n-i, i+1) for i in range(k)), 1))
so it won't err for big N/K
A quick search on google code gives (it uses formula from @Mark Byers's answer):
def choose(n, k):
"""
A fast way to calculate binomial coefficients by Andrew Dalke (contrib).
"""
if 0 <= k <= n:
ntok = 1
ktok = 1
for t in xrange(1, min(k, n - k) + 1):
ntok *= n
ktok *= t
n -= 1
return ntok // ktok
else:
return 0
choose()
is 10 times faster (tested on all 0 <= (n,k) < 1e3 pairs) than scipy.misc.comb()
if you need an exact answer.
def comb(N,k): # from scipy.comb(), but MODIFIED!
if (k > N) or (N < 0) or (k < 0):
return 0L
N,k = map(long,(N,k))
top = N
val = 1L
while (top > (N-k)):
val *= top
top -= 1
n = 1L
while (n < k+1L):
val /= n
n += 1
return val
If you want exact results and speed, try gmpy -- gmpy.comb
should do exactly what you ask for, and it's pretty fast (of course, as gmpy
's original author, I am biased;-).
If you want an exact result, use sympy.binomial
. It seems to be the fastest method, hands down.
x = 1000000
y = 234050
%timeit scipy.misc.comb(x, y, exact=True)
1 loops, best of 3: 1min 27s per loop
%timeit gmpy.comb(x, y)
1 loops, best of 3: 1.97 s per loop
%timeit int(sympy.binomial(x, y))
100000 loops, best of 3: 5.06 µs per loop
A literal translation of the mathematical definition is quite adequate in a lot of cases (remembering that Python will automatically use big number arithmetic):
from math import factorial
def calculate_combinations(n, r):
return factorial(n) // factorial(r) // factorial(n-r)
For some inputs I tested (e.g. n=1000 r=500) this was more than 10 times faster than the one liner reduce
suggested in another (currently highest voted) answer. On the other hand, it is out-performed by the snippit provided by @J.F. Sebastian.
Starting Python 3.8
, the standard library now includes the math.comb
function to compute the binomial coefficient:
math.comb(n, k)
which is the number of ways to choose k items from n items without repetition n! / (k! (n - k)!)
:
import math
math.comb(10, 5) # 252
Here's another alternative. This one was originally written in C++, so it can be backported to C++ for a finite-precision integer (e.g. __int64). The advantage is (1) it involves only integer operations, and (2) it avoids bloating the integer value by doing successive pairs of multiplication and division. I've tested the result with Nas Banov's Pascal triangle, it gets the correct answer:
def choose(n,r):
"""Computes n! / (r! (n-r)!) exactly. Returns a python long int."""
assert n >= 0
assert 0 <= r <= n
c = 1L
denom = 1
for (num,denom) in zip(xrange(n,n-r,-1), xrange(1,r+1,1)):
c = (c * num) // denom
return c
Rationale: To minimize the # of multiplications and divisions, we rewrite the expression as
n! n(n-1)...(n-r+1)
--------- = ----------------
r!(n-r)! r!
To avoid multiplication overflow as much as possible, we will evaluate in the following STRICT order, from left to right:
n / 1 * (n-1) / 2 * (n-2) / 3 * ... * (n-r+1) / r
We can show that integer arithmatic operated in this order is exact (i.e. no roundoff error).
You can write 2 simple functions that actually turns out to be about 5-8 times faster than using scipy.special.comb. In fact, you don't need to import any extra packages, and the function is quite easily readable. The trick is to use memoization to store previously computed values, and using the definition of nCr
# create a memoization dictionary
memo = {}
def factorial(n):
"""
Calculate the factorial of an input using memoization
:param n: int
:rtype value: int
"""
if n in [1,0]:
return 1
if n in memo:
return memo[n]
value = n*factorial(n-1)
memo[n] = value
return value
def ncr(n, k):
"""
Choose k elements from a set of n elements - n must be larger than or equal to k
:param n: int
:param k: int
:rtype: int
"""
return factorial(n)/(factorial(k)*factorial(n-k))
If we compare times
from scipy.special import comb
%timeit comb(100,48)
>>> 100000 loops, best of 3: 6.78 µs per loop
%timeit ncr(100,48)
>>> 1000000 loops, best of 3: 1.39 µs per loop
Using dynamic programming, the time complexity is Θ(n*m) and space complexity Θ(m):
def binomial(n, k):
""" (int, int) -> int
| c(n-1, k-1) + c(n-1, k), if 0 < k < n
c(n,k) = | 1 , if n = k
| 1 , if k = 0
Precondition: n > k
>>> binomial(9, 2)
36
"""
c = [0] * (n + 1)
c[0] = 1
for i in range(1, n + 1):
c[i] = 1
j = i - 1
while j > 0:
c[j] += c[j - 1]
j -= 1
return c[k]
If your program has an upper bound to n
(say n <= N
) and needs to repeatedly compute nCr (preferably for >>N
times), using lru_cache can give you a huge performance boost:
from functools import lru_cache
@lru_cache(maxsize=None)
def nCr(n, r):
return 1 if r == 0 or r == n else nCr(n - 1, r - 1) + nCr(n - 1, r)
Constructing the cache (which is done implicitly) takes up to O(N^2)
time. Any subsequent calls to nCr
will return in O(1)
.
It's pretty easy with sympy.
import sympy
comb = sympy.binomial(n, r)
Using only standard library distributed with Python:
import itertools
def nCk(n, k):
return len(list(itertools.combinations(range(n), k)))
The direct formula produces big integers when n is bigger than 20.
So, yet another response:
from math import factorial
reduce(long.__mul__, range(n-r+1, n+1), 1L) // factorial(r)
short, accurate and efficient because this avoids python big integers by sticking with longs.
It is more accurate and faster when comparing to scipy.special.comb:
>>> from scipy.special import comb
>>> nCr = lambda n,r: reduce(long.__mul__, range(n-r+1, n+1), 1L) // factorial(r)
>>> comb(128,20)
1.1965669823265365e+23
>>> nCr(128,20)
119656698232656998274400L # accurate, no loss
>>> from timeit import timeit
>>> timeit(lambda: comb(n,r))
8.231969118118286
>>> timeit(lambda: nCr(128, 20))
3.885951042175293
This function is very optimized.
def nCk(n,k):
m=0
if k==0:
m=1
if k==1:
m=n
if k>=2:
num,dem,op1,op2=1,1,k,n
while(op1>=1):
num*=op2
dem*=op1
op1-=1
op2-=1
m=num//dem
return m
That's probably as fast as you can do it in pure python for reasonably large inputs:
def choose(n, k):
if k == n: return 1
if k > n: return 0
d, q = max(k, n-k), min(k, n-k)
num = 1
for n in xrange(d+1, n+1): num *= n
denom = 1
for d in xrange(1, q+1): denom *= d
return num / denom
Here is an efficient algorithm for you
for i = 1.....r
p = p * ( n - i ) / i
print(p)
For example nCr(30,7) = fact(30) / ( fact(7) * fact(23)) = ( 30 * 29 * 28 * 27 * 26 * 25 * 24 ) / (1 * 2 * 3 * 4 * 5 * 6 * 7)
So just run the loop from 1 to r can get the result.
In python:
n,r=5,2
p=n
for i in range(1,r):
p = p*(n - i)/i
else:
p = p/(i+1)
print(p)
I timed 17 different functions from this thread and libraries linked here.
Since I feel it's a bit much to dump here, I put the code for the functions in a pastebin here.
The first test I did was to build pascal's triangle to the 100th row. I used timeit to do this 100 times. The numbers below are the average time it took in seconds to build the triangle once.
gmpy2.gmpy2.comb 0.0012259269999998423
math.comb 0.007063110999999935
__main__.stdfactorial2 0.011469491
__main__.scipybinom 0.0120114319999999
__main__.stdfactorial 0.012105122
__main__.scipycombexact 0.012569045999999844
__main__.andrewdalke 0.01825201100000015
__main__.rabih 0.018472497000000202
__main__.kta 0.019374668000000383
__main__.wirawan 0.029312811000000067
scipy.special._basic.comb 0.03221609299999954
__main__.jfsmodifiedscipy 0.04332894699999997
__main__.rojas 0.04395155400000021
sympy.functions.combinatorial.factorials.binomial 0.3233529779999998
__main__.nasbanov 0.593365528
__main__.pantelis300 1.7780402499999999
You may notice that there are only 16 functions here. That's because the recursive()
function couldn't complete this even once in a reasonable amount of time, so I had to exclude it from the timeit tests. seriously, it's been going for hours.
I also timed various other types of inputs that not all of the above functions supported. Keep in mind that I only ran the test for each 10 times because nCr is computationally expensive and I'm impatient
Fractional values for n
__main__.scipybinom 0.011481370000000001
__main__.kta 0.01869513999999999
sympy.functions.combinatorial.factorials.binomial 6.33897291
Fractional values for r
__main__.scipybinom 0.010960040000000504
scipy.special._basic.comb 0.03681254999999908
sympy.functions.combinatorial.factorials.binomial 3.2962564499999987
Fractional values for n and r
__main__.scipybinom 0.008623409999998444
sympy.functions.combinatorial.factorials.binomial 3.690936439999999
Negative values for n
gmpy2.gmpy2.comb 0.010770989999997482
__main__.kta 0.02187850000000253
__main__.rojas 0.05104292999999984
__main__.nasbanov 0.6153183200000001
sympy.functions.combinatorial.factorials.binomial 3.0460310799999943
Negative fractional values for n, fractional values for r
sympy.functions.combinatorial.factorials.binomial 3.7689941699999965
the best solution currently for maximum speed and versatility would be a hybrid function to choose between different algorithms depending on the inputs
def hybrid(n: typing.Union[int, float], k: typing.Union[int, float]) -> typing.Union[int, float]:
# my own custom hybrid solution
def is_integer(n):
return isinstance(n, int) or n.is_integer()
if k < 0:
raise ValueError("k cannot be negative.")
elif n == 0:
return 0
elif k == 0 or k == n:
return 1
elif is_integer(n) and is_integer(k):
return int(gmpy2.comb(int(n), int(k)))
elif n > 0:
return scipy.special.binom(n, k)
else:
return float(sympy.binomial(n, k))
Since sympy.binomial()
is so slow, the true ideal solution would be to combine the code of scipy.special.binom()
which performs well for fractions and gmpy2.comb()
which performs well for ints. scipy's func and gympy2's func are both written in C which I am not very familiar with.
This is @killerT2333 code using the builtin memoization decorator.
from functools import lru_cache
@lru_cache()
def factorial(n):
"""
Calculate the factorial of an input using memoization
:param n: int
:rtype value: int
"""
return 1 if n in (1, 0) else n * factorial(n-1)
@lru_cache()
def ncr(n, k):
"""
Choose k elements from a set of n elements,
n must be greater than or equal to k.
:param n: int
:param k: int
:rtype: int
"""
return factorial(n) / (factorial(k) * factorial(n - k))
print(ncr(6, 3))
精彩评论