python c extension for standard deviation

2023-01-10 06:46 问答作者：

I'm writing a c extension to calculate he standard deviation. Performance is important because it will be performed over large data sets. I'm having a hard time figuring out how to get the value of pyobject once I get the item from a list. This is my first time writing a c extension for python and any help is appreciated. Apparently I don't know how to use the code sample button correctly开发者_Go百科 :(

This is what I have so far:

    #include <Python.h>
static PyObject*
func(PyObject *self, PyObject *args)
{
  PyObject *list, *item;
  Py_ssize_t i, len;
  if (!PyArg_UnpackTuple(args, "func", 1, 1, &list)){
    return NULL;
  }
  printf("hello world\n");
  Py_INCREF(list);
  len = PyList_GET_SIZE(list);
  for (i=0;i<len;i++){
    item = PyList_GET_ITEM(list, i);
    PyObject_Print(item,stdout,0);
  }
  return list;
}

static char func_doc[] = "This function calculates standard deviation.";

static PyMethodDef std_methods[] = {
  {"func", func, METH_VARARGS, func_doc},
  {NULL, NULL}
};

PyMODINIT_FUNC
initstd(void)
{
  Py_InitModule3("std", std_methods, "This is a sample docstring.");
}

You may be reinventing the wheel. There are several scientific computing libraries for Python, such as SciPy and Numpy, which are mostly wrappers around C libraries, that implement functions such as standard deviation.

Once you have item, you can get its float value with PyNumber_Float:

PyObject* floatitem = PyNumber_Float(item);

Now you need to check and exit on error (if(!floatitem) return 0 -- or a goto to a location where you decref anything you may have incref'd in the previous part of your code, e.g. in your case list). If no error, PyFloat_AsDouble gives you the required double value for use in the rest of your C-coded loop:

double ditem = PyFloat_AsDouble(floatitem);

after which you can decref floatitem and go your merry way. Don't worry overmuch about conversion overhead in PyNumber_Float -- there won't be any if you're passed a list of floats in the first place;-). If you do still worry (would rather give an error if somebody does pass a non-float requiring conversion) you can use PyFloat_Check if you insist (but I'd suggest at least special-casing int and long items unless you want truly perplexed and unhappy users;-). In a similar vein, I'd also strongly recommend studying and using PySequence_Fast and friends, rather than astonishing users by specifically requiring lists rather than other types of sequences!-).

Just to mention that there is almost certainly a better way than writing a C extension.

First option is to use NumPy. In the comment you have on the other answer you mention that it is expensive to convert the list to an array. This may be true if the standard deviation calculation is the only bit you are doing with the data which is highly unlikely.

Barring that, I would go for Cython. Here is a comparison of Cython and NumPy. Cython underperforms NumPy in this case, but more importantly the code implemented for csum can trivially be changed to compute the standard deviation.

Have you considered using cython to write your extension. It is perfect for this type of thing

This method will be limited by the number of items in the list.

Another design would keep a running total and let you add points until you overflowed the double.

If you want simple statistics over large datasets, you can randomly sample a subset of the data and take the average and standard deviation of that. That will have a "standard error" of approximation, and the more samples you take, the smaller that will be. If you don't need high precision of the statistics, you don't need to read all the data.

继续阅读：c performance python standard-deviation

python c extension for standard deviation

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？