开发者

How can I pass large arrays between numpy and R?

I'm using python and numpy/scipy to do regex and stemming for a text processing application. But I want to开发者_JS百科 use some of R's statistical packages as well.

What's the best way to pass the data from python to R? (And back?)

Also, I need to backup the array to disk at some point, so I'm open to saving from python and loading th R if that's the best solution. The matrices are pretty big (e.g. 100,000 x 10,000), so using sparse matrices might also be nice.

Apologies if this is a repost. I haven't been able to find anything that puts all these pieces together.


  • Have you already looked into RPy? It's a python interface to R. I guess that would spare you the data handling.

  • To backup your NumPy arrays you can use pickle. As it seems to create a lot of overhead when saving huge data, NumPy arrays are best saved using the HDF standard. Here's a article covering that: http://www.shocksolution.com/2010/01/10/storing-large-numpy-arrays-on-disk-python-pickle-vs-hdf5adsf/


Use Rpy, http://rpy.sourceforge.net/, to call R from Python.

The caveat is that both R and Python versions need to be exactly the one for which the Rpy binary has been built. You thus need to be careful with the installation.


I cannot comment on "large data" between shared between R and Python, but I have had a much easier time working with pyRserve than RPy or RPy2.

That being said, I am curious about the text processing you are doing? Python obviously has a lot to offer on the text processing side, but statistically there is a lot too in packages like NLTK and the Pattern package from CLiPS. Are you just more comfortable doing stats in R, or is there something specific missing in Python?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜