开发者

Using Cython to wrap a library that wraps another library

My goal is to use Cython to wrap the Apohenia library, a C library for scientific computing.

This is an effort to not rebuild the wheel, and Apophenia itself tries to do the same, by basing its structures on those from the GNU Scientific Library:

typedef struct {
  gsl_vector *vector;
  gsl_matrix *matrix;
  gsl_vector *weights;
  apop_names *names;
  ...
} apop_data;

Apophenia provides lots of vector/matrix operations that the G开发者_C百科SL either doesn't provide or provides a little awkwardly, but if the GSL has a function, there's no point rewriting it. You should be able to write C code that jumps between the apop_data set as a whole and its GSL parts as often as needed, e.g.:

apop_data *dataset = apop_text_to_data("infile.csv"); //fill the matrix element
gsl_vector *minv = apop_matrix_inverse(dataset->matrix);
apop_data *dinv = apop_matrix_to_data(minv);
apop_data *identity_matrix = apop_dot(dataset, dinv); // I = D * D^-1
dataset->vector = gsl_vector_alloc(10);
gsl_vector_set_all(dataset->vector, 1);

I'm not sure how to wrap this in Cython. The typical method seems to be to provide a Python-side structure that includes an internal copy of the C struct being wrapped:

"""I'm omitting the Cython declarations of the C structs and functions,
 which are just translations of the C declarations. Let those be in c_apop."""

cdef class apop_data:
   cdef c_apop.apop_data *d

   def set(self, row, col, val):
       c_apop.apop_data_set(self.d, row, col, val)

   def get(self, row, col):
       c_apop.apop_data_get(self.d, row, col)

   [et cetera]


cdef class gsl_vector:
   cdef c_apop.gsl_vector *v

   def set(self, row, val):
       c_apop.gsl_vector_set(self.v, row)

   def get(self, row):
       c_apop.gsl_vector_get(self.v, row)

   [et cetera]

But now we're stuck, because if we were to get the vector element from the data set,

 pyd = apop_data(10)
 v = pyd.d.vector

v is a raw C gsl_vector, not a python object, so the next line can't be v.get(0) or v.set(0, 1).

We could add methods to the apop_data class named vector_get and vector_set, that will return a python-wrapped gsl_vector, but that creates its own issues: if the user reallocates the C vector underlying the py-vector from pyv = pyd.get_vector(), how do we guarantee that pyd.d.vector is reallocated with it?

I've tried a couple of things, and I feel like I'm missing the point every time. Any suggestions on how to best design the Cython classes for this situation?


The C Structure should never be exposed to the python side. I gave a quick look at the library and does not seems to have anything out of the ordinary.

The only situation that you have to track is when the library actually reallocates the underlying vector. Those functions usually will require a pointer to pointer and will update the pointer value to the new allocated structure.

Why do you need to expose the pyd.get_vector ?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜