Optimizing my Cython/Numpy code? Only a 30% performance gain so far

2023-02-17 03:14 问答作者：

Is there anything I'开发者_如何学运维ve forgotten to do here in order to speed things up a bit? I'm trying to implement an algorithm described in a book called Tuning Timbre Spectrum Scale. Also---if all else fails, is there a way for me to just write this part of the code in C, then be able to call it from python?

import numpy as np
cimport numpy as np

# DTYPE = np.float
ctypedef np.float_t DTYPE_t

np.seterr(divide='raise', over='raise', under='ignore', invalid='raise')

"""
I define a timbre as the following 2d numpy array:
[[f0, a0], [f1, a1], [f2, a2]...] where f describes the frequency
of the given partial and a is its amplitude from 0 to 1. Phase is ignored.
"""

#Test Timbre
# cdef np.ndarray[DTYPE_t,ndim=2] t1 = np.array( [[440,1],[880,.5],[(440*3),.333]])

# Calculates the inherent dissonance of one timbres of the above form
# using the diss2Partials function
cdef DTYPE_t diss1Timbre(np.ndarray[DTYPE_t,ndim=2] t):
    cdef DTYPE_t runningDiss1
    runningDiss1 = 0.0
    cdef unsigned int len = np.shape(t)[0]
    cdef unsigned int i
    cdef unsigned int j
    for i from 0 <= i < len:
        for j from i+1 <= j < len:
            runningDiss1 += diss2Partials(t[i], t[j])
    return runningDiss1

# Calculates the dissonance between two timbres of the above form 
cdef DTYPE_t diss2Timbres(np.ndarray[DTYPE_t,ndim=2] t1, np.ndarray[DTYPE_t,ndim=2] t2):
    cdef DTYPE_t runningDiss2
    runningDiss2 = 0.0
    cdef unsigned int len1 = np.shape(t1)[0]
    cdef unsigned int len2 = np.shape(t2)[0]
    runningDiss2 += diss1Timbre(t1)
    runningDiss2 += diss1Timbre(t2)
    cdef unsigned int i1
    cdef unsigned int i2
    for i1 from 0 <= i1 < len1:
        for i2 from 0 <= i2 < len2:
            runningDiss2 += diss2Partials(t1[i1], t2[i2])
    return runningDiss2

cdef inline DTYPE_t float_min(DTYPE_t a, DTYPE_t b): return a if a <= b else b

# Calculates the dissonance of two partials of the form [f,a]
cdef DTYPE_t diss2Partials(np.ndarray[DTYPE_t,ndim=1] p1, np.ndarray[DTYPE_t,ndim=1] p2):
    cdef DTYPE_t f1 = p1[0]
    cdef DTYPE_t f2 = p2[0]
    cdef DTYPE_t a1 = abs(p1[1])
    cdef DTYPE_t a2 = abs(p2[1])

    # In order to insure that f2 > f1:
    if (f2 < f1):
        (f1,f2,a1,a2) = (f2,f1,a2,a1)

    # Constants of the dissonance curves
    cdef DTYPE_t _xStar
    _xStar = 0.24
    cdef DTYPE_t _s1
    _s1 = 0.021
    cdef DTYPE_t _s2
    _s2 = 19
    cdef DTYPE_t _b1
    _b1 = 3.5
    cdef DTYPE_t _b2
    _b2 = 5.75

    cdef DTYPE_t a = float_min(a1,a2)
    cdef DTYPE_t s = _xStar/(_s1*f1 + _s2)
    return (a * (np.exp(-_b1*s*(f2-f1)) - np.exp(-_b2*s*(f2-f1)) ) )

cpdef dissTimbreScale(np.ndarray[DTYPE_t,ndim=2] t,np.ndarray[DTYPE_t,ndim=1] s):
    cdef DTYPE_t currDiss
    currDiss = 0.0;
    cdef unsigned int i
    for i from 0 <= i < s.size:
        currDiss += diss2Timbres(t, transpose(t,s[i]))
    return currDiss

cdef np.ndarray[DTYPE_t,ndim=2] transpose(np.ndarray[DTYPE_t,ndim=2] t, DTYPE_t ratio):
    return np.dot(t, np.array([[ratio,0],[0,1]]))

Link to code: Cython Code

Here are some things that I noticed:

Use t1.shape[0] instead of np.shape(t1)[0] and in so on in other places.
Don't use len as a variable because it is a built-in function in Python (not for speed, but for good practice). Use L or something like that.
Don't pass two-element arrays to functions unless you really need to. Cython checks the buffer every time you do pass an array. So, when using diss2Partials(t[i], t[j]) do diss2Partials(t[i,0], t[i,1], t[j,0], t[j,1]) instead and redefine diss2Partials appropriately.
Don't use abs, or at least not the Python one. It is having to convert your C double to a Python float, call the abs function, then convert back to a C double. It would probably be better to make an inlined function like you did with float_min.
Calling np.exp is doing a similar thing to using abs. Change np.exp to exp and add from libc.math cimport exp to your imports at the top.
Get rid of the transpose function completely. The np.dot is really slowing things down, but there really is no need for matrix multiplication here anyway. Rewrite your dissTimbreScale function to create an empty matrix, say t2. Before the current loop, set the second column of t2 to be equal to the second column of t (using a loop preferably, but you could probably get away with a Numpy operation here). Then, inside of the current loop, put in a loop that sets the first column of t2 equal to the first column of t times s[i]. That's what your matrix multiplication was really doing. Then just pass t2 as the second parameter to diss2Timbres instead of the one returned by the transpose function.

Do 1-5 first because they are rather easy. Number 6 may take a little more time, effort and maybe experimentation, but I suspect that it may also give you a significant boost in speed.

In your code:

for i from 0 <= i < len:
    for j from i+1 <= j < len:
        runningDiss1 += diss2Partials(t[i], t[j])
return runningDiss1

bounds checking is performed for each array lookup, use the decorator @cython.boundscheck(False) before the function, and then cast to an unsigned int type before using i and j as the indices. Look up the cython for Numpy tutorial for more info.

I would profile your code in order to see which function takes the most time. If it is diss2Timbres you may benefit from the package "numexpr".

I compared Python/Cython and Numexpr for one of my functions (link to SO). Depending on the size of the array, numexpr outperformed both, Cython and Fortran.

NOTE: Just figured out this post is really old...

继续阅读：cython numpy python

Optimizing my Cython/Numpy code? Only a 30% performance gain so far

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？