开发者

Shared Workspaces in R?

Does anyone have a recommendation for sharing workspaces and data frames in R? I'm working at a start-up and we have little experience working in larger production environments where lots of employees are all using the same data.

Is there a way to set permissions on data frames and share them? Or do orgs in our situation just store their data in a database like MySQL and just download it to data frames on a case-by-case basis?

Any tips would be 开发者_如何学Cgreatly appeciated from those with experience in this area!

Thanks!


One approach would be to dump variables via save() to a shared location and have other read those in via load() -- it has the added benefits of compression and fast read/write operations for binary modes.

You can of course also serialize to file or a database. Or, if you must, even to human-readable files but those will be the slowest to be read back in.

Edit As per comments, here is how to change file modes post-save:

R> foo <- 1:3
R> save(foo, file="/tmp/SimpleDemo.RData")
R> Sys.chmod("/tmp/SimpleDemo.RData", mode="0444")
R> system("ls -l /tmp/SimpleDemo.RData")
-r--r--r-- 1 edd edd 62 2011-08-15 16:26 /tmp/SimpleDemo.RData


You can consider using stashR to make a small server hosting working datasets. It is much more handy than dangling files and is more direct than querying SQL again and again.

As for storing results, the better option is to store scripts that can make them than just results (so-called reproducible research) and manage them using VCS. This of course becomes painful in case of heavy computations, but then one can think of some automatic system that reacts to certain VCS changes and populates some base of intermediate results.


save() and load() is the best way to go about it. I additionally use source() to load libraries to make sure that the objects I store using save() get interpreted correctly when I load() them back.

I like to create all of my objects and save() them and reuse them in subsequent session. But, for example, if I saved an xts time series object, upon being reloaded, its meta-structure will not be correctly identified till I execute library(xts).

To see this, you can run

str(xts1) #(xts1 is your xts object) 

before and after loading the xts library.

So it is possible to save() and share all sorts of objects, but you should remember to load the associated library/definition before you start reusing it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜