开发者

What are the implications of calling NumPy's C API functions from multiple threads?

This is risky business, and I understand the Global Interpreter Loc开发者_如何学JAVAk to be a formidable foe of parallelism. However, if I'm using NumPy's C API (specifically the PyArray_DATA macro on a NumPy array), are there potential consequences to invoking it from multiple concurrent threads?

Note that I will still own the GIL and not be releasing it with NumPy's threading support. Also, even if NumPy makes no guarantees about thread safety but PyArray_DATA is thread-safe in practice, that's good enough for me.

I'm running Python 2.6.6 with NumPy 1.3.0 on Linux.


Answering my own question here, but after poking into the source code for NumPy 1.3.0, I believe the answer is: Yes, PyArray_DATA is thread-safe.

  1. PyArray_DATA is defined in ndarrayobject.h:

    #define PyArray_DATA(obj) ((void *)(((PyArrayObject *)(obj))->data))
    
  2. The PyArrayObject struct type is defined in the same file; the field of interest is:

    char *data;
    

    So now, the question is whether accessing data from multiple threads is safe or not.

  3. Creating a new NumPy array from scratch (i.e., not deriving it from an existing data structure) passes a NULL data pointer to PyArray_NewFromDescr, defined in arrayobject.c.

  4. This causes PyArray_NewFromDescr to invoke PyDataMem_NEW in order to allocate memory for the PyArrayObject's data field. This is simply a macro for malloc:

    #define PyDataMem_NEW(size) ((char *)malloc(size))
    

In summary, PyArray_DATA is thread-safe and as long as the NumPy arrays are created separately, it is safe to write to them from different threads.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜