Optimizing my Cython/Numpy code? Only a 30% performance gain so far
Is there anything I'开发者_如何学运维ve forgotten to do here in order to speed things up a bit? I'm trying to implement an algorithm described in a book called Tuning Timbre Spectrum Scale. Also---if all else fails, is there a way for me to just write this part of the code in C, then be able to call it from python?
import numpy as np
cimport numpy as np
# DTYPE = np.float
ctypedef np.float_t DTYPE_t
np.seterr(divide='raise', over='raise', under='ignore', invalid='raise')
"""
I define a timbre as the following 2d numpy array:
[[f0, a0], [f1, a1], [f2, a2]...] where f describes the frequency
of the given partial and a is its amplitude from 0 to 1. Phase is ignored.
"""
#Test Timbre
# cdef np.ndarray[DTYPE_t,ndim=2] t1 = np.array( [[440,1],[880,.5],[(440*3),.333]])
# Calculates the inherent dissonance of one timbres of the above form
# using the diss2Partials function
cdef DTYPE_t diss1Timbre(np.ndarray[DTYPE_t,ndim=2] t):
cdef DTYPE_t runningDiss1
runningDiss1 = 0.0
cdef unsigned int len = np.shape(t)[0]
cdef unsigned int i
cdef unsigned int j
for i from 0 <= i < len:
for j from i+1 <= j < len:
runningDiss1 += diss2Partials(t[i], t[j])
return runningDiss1
# Calculates the dissonance between two timbres of the above form
cdef DTYPE_t diss2Timbres(np.ndarray[DTYPE_t,ndim=2] t1, np.ndarray[DTYPE_t,ndim=2] t2):
cdef DTYPE_t runningDiss2
runningDiss2 = 0.0
cdef unsigned int len1 = np.shape(t1)[0]
cdef unsigned int len2 = np.shape(t2)[0]
runningDiss2 += diss1Timbre(t1)
runningDiss2 += diss1Timbre(t2)
cdef unsigned int i1
cdef unsigned int i2
for i1 from 0 <= i1 < len1:
for i2 from 0 <= i2 < len2:
runningDiss2 += diss2Partials(t1[i1], t2[i2])
return runningDiss2
cdef inline DTYPE_t float_min(DTYPE_t a, DTYPE_t b): return a if a <= b else b
# Calculates the dissonance of two partials of the form [f,a]
cdef DTYPE_t diss2Partials(np.ndarray[DTYPE_t,ndim=1] p1, np.ndarray[DTYPE_t,ndim=1] p2):
cdef DTYPE_t f1 = p1[0]
cdef DTYPE_t f2 = p2[0]
cdef DTYPE_t a1 = abs(p1[1])
cdef DTYPE_t a2 = abs(p2[1])
# In order to insure that f2 > f1:
if (f2 < f1):
(f1,f2,a1,a2) = (f2,f1,a2,a1)
# Constants of the dissonance curves
cdef DTYPE_t _xStar
_xStar = 0.24
cdef DTYPE_t _s1
_s1 = 0.021
cdef DTYPE_t _s2
_s2 = 19
cdef DTYPE_t _b1
_b1 = 3.5
cdef DTYPE_t _b2
_b2 = 5.75
cdef DTYPE_t a = float_min(a1,a2)
cdef DTYPE_t s = _xStar/(_s1*f1 + _s2)
return (a * (np.exp(-_b1*s*(f2-f1)) - np.exp(-_b2*s*(f2-f1)) ) )
cpdef dissTimbreScale(np.ndarray[DTYPE_t,ndim=2] t,np.ndarray[DTYPE_t,ndim=1] s):
cdef DTYPE_t currDiss
currDiss = 0.0;
cdef unsigned int i
for i from 0 <= i < s.size:
currDiss += diss2Timbres(t, transpose(t,s[i]))
return currDiss
cdef np.ndarray[DTYPE_t,ndim=2] transpose(np.ndarray[DTYPE_t,ndim=2] t, DTYPE_t ratio):
return np.dot(t, np.array([[ratio,0],[0,1]]))
Link to code: Cython Code
Here are some things that I noticed:
- Use
t1.shape[0]
instead ofnp.shape(t1)[0]
and in so on in other places. - Don't use
len
as a variable because it is a built-in function in Python (not for speed, but for good practice). Use L or something like that. - Don't pass two-element arrays to functions unless you really need to. Cython checks the buffer every time you do pass an array. So, when using
diss2Partials(t[i], t[j])
dodiss2Partials(t[i,0], t[i,1], t[j,0], t[j,1])
instead and redefinediss2Partials
appropriately. - Don't use
abs
, or at least not the Python one. It is having to convert your C double to a Python float, call the abs function, then convert back to a C double. It would probably be better to make an inlined function like you did withfloat_min
. - Calling
np.exp
is doing a similar thing to usingabs
. Changenp.exp
toexp
and addfrom libc.math cimport exp
to your imports at the top. - Get rid of the
transpose
function completely. Thenp.dot
is really slowing things down, but there really is no need for matrix multiplication here anyway. Rewrite yourdissTimbreScale
function to create an empty matrix, sayt2
. Before the current loop, set the second column oft2
to be equal to the second column oft
(using a loop preferably, but you could probably get away with a Numpy operation here). Then, inside of the current loop, put in a loop that sets the first column oft2
equal to the first column oft
timess[i]
. That's what your matrix multiplication was really doing. Then just passt2
as the second parameter todiss2Timbres
instead of the one returned by thetranspose
function.
Do 1-5 first because they are rather easy. Number 6 may take a little more time, effort and maybe experimentation, but I suspect that it may also give you a significant boost in speed.
In your code:
for i from 0 <= i < len:
for j from i+1 <= j < len:
runningDiss1 += diss2Partials(t[i], t[j])
return runningDiss1
bounds checking is performed for each array lookup, use the decorator @cython.boundscheck(False)
before the function, and then cast to an unsigned int type before using i and j as the indices. Look up the cython for Numpy tutorial for more info.
I would profile your code in order to see which function takes the most time. If it is diss2Timbres
you may benefit from the package "numexpr".
I compared Python/Cython and Numexpr for one of my functions (link to SO). Depending on the size of the array, numexpr outperformed both, Cython and Fortran.
NOTE: Just figured out this post is really old...
精彩评论