numpy calling sse2 via ctypes

2023-01-03 17:14 问答作者：

In brief, I am trying to call into a shared library from python, more specifically, from numpy. The shared library is implemented in C using sse2 instructions. Enabling optimisation, i.e. building the library with -O2 or –O1, I am facing strange segfaults when calling into the shared library via ctypes. Disabling optimisation (-O0), everything works out as expected, as is the case when linking the library to a c-program directly (optimised or not). Attached you find a snipped which exhibits the delineated behaviour on my system. With optimisation enabled, gdb reports a segfault in __builtin_ia32_loadupd (__P) at emmintrin.h:113. The value of __P is reported as optimised out.

test.c:

#include <emmintrin.h>
#include <complex.h>
void test(const int m, const double* x, double complex* y) {

    int i;
    __m128d _f, _x, _b;
    double complex f __attribute__( (aligned(16)) );
    double comple开发者_如何学JAVAx b __attribute__( (aligned(16)) );
    __m128d* _p;

    b = 1;
    _b = _mm_loadu_pd( (double *) &b );

    _p = (__m128d*) y;

    for(i=0; i<m; ++i) {
        f = cexp(-I*x[i]);
        _f = _mm_loadu_pd( (double *) &f );
        _x = _mm_loadu_pd( (double *) &x[i] );      
        _f = _mm_shuffle_pd(_f, _f, 1);
        *_p = _mm_add_pd(*_p, _f);
        *_p = _mm_add_pd(*_p, _x); 
        *_p = _mm_mul_pd(*_p,_b);
        _p++;
    }
    return;
}

Compiler flags: gcc -o libtest.so -shared -std=c99 -msse2 -fPIC -O2 -g -lm test.c

test.py:

import numpy as np
import os

def zerovec_aligned(nr, dtype=np.float64, boundary=16):
    '''Create an aligned array of zeros.
    '''
    size = nr * np.dtype(dtype).itemsize
    tmp = np.zeros(size + boundary, dtype=np.uint8)
    address = tmp.__array_interface__['data'][0]
    offset = boundary - address % boundary
    return tmp[offset:offset + size].view(dtype=dtype)


lib = np.ctypeslib.load_library('libtest', '.' )
lib.test.restype = None
lib.test.argtypes = [np.ctypeslib.ctypes.c_int,
                     np.ctypeslib.ndpointer(np.float64, flags=('C', 'A') ),
                     np.ctypeslib.ndpointer(np.complex128, flags=('C', 'A', 'W') )]


n = 13
y = zerovec_aligned(n, dtype=np.complex128)
x = np.ones(n, dtype=np.float64)
# x = zerovec_aligned(n, dtype=np.float64)
# x[:] = 1.

lib.test(n,x,y)

Calling test from C works as expected:

call_from_c.c:

#include <stdio.h>
#include <complex.h>
#include <stdlib.h>
#include <emmintrin.h>

void test(const int m, const double* x, double complex* y);

int main() {

    int i; 
    const int n = 11;
    double complex *y = (double complex*) _mm_malloc(n*sizeof(double complex), 16);
    double *x = (double *) malloc(n*sizeof(double));
    for(i=0; i<n; ++i) {
        x[i] = 1;
        y[i] = 0;
    }

    test(n, x, y);
    for(i=0; i<n; ++i)
            printf("[%f %f]\n", creal(y[i]), cimag(y[i]));

    return 1;

}

Compile and call:

gcc -std=c99 -otestc -msse2 -L. -ltest call_from_c.c

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:.

./testc

... works.

My system:

Ubuntu Linux i686 2.6.31-22-generic
Compiler: gcc (Ubuntu 4.4.1-4ubuntu9)
Python: Python 2.6.4 (r264:75706, Dec 7 2009, 18:45:15) [GCC 4.4.1]
Numpy: 1.4.0

I have taken provisions (cf. python code) that y is aligned and the alignment of x should not matter (I think; explicitly aligning x does not solve the problem though).

Note also that i use _mm_loadu_pd instead of _mm_load_pd when loading b and f. For the C-only version _mm_load_pd works (as expected). However, when calling the function via ctypes using _mm_load_pd always segfaults (independent of optimisation).

I have tried several days to sort out this issue without success ... and I am on the verge beating my monitor to death. Any input welcome. Daniel

I just got bitten by this trying to call some SSE-code from python, the problem seems to be that GCC wants to assume that the stack is aligned on 16-byte boundaries (the largest native type on the architecture, i.e. the SSE-types), and calculates all offset with that assumption. When that assumption is false, the SSE-instructions will trap.

The answer seems to be to compile with

gcc -mstackrealign

which changes the function prologues to always align the stack to 16 bytes.

Try building your extension using numpy build system to discount potential cflags/ldflags differences: http://projects.scipy.org/numpy/wiki/NumpySconsExtExamples

Have you try upgrading to Numpy 1.5.0b2. Just run the following (but be careful it might break other things (you will have to recompile all pyrex):

sudo easy_install -U numpy

I was having similar problems with ctypes when I was trying to use H5PY (I had to recompile the .deb to get with the latest version of numpy) and also there was major issues with weave that the latest upgrade fixed.

继续阅读：ctypes numpy python sse2

numpy calling sse2 via ctypes

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？