开发者

Using C++ libraries in an R package

What is the best way to make use of a C++ library in R, hopefully preserving the C++ data structures. I'm not at all a C++ user, so I'm not clear on the relative merits of the available approaches. The R-ext manual seems to suggest wrapping every C++ function in C. However, at least four or five other means of incorporating C++ exist.

Two ways are packages w/ similar lineage, the Rcpp (maintained by the prolific overflower Dirk Eddelbuettel) and RcppTemplate packages (both on CRAN), what are the differences between the two?

Another package, rcppbind available, on开发者_开发知识库 R forge that claims to take a different approach to binding C++ and R (I'm not knowledgeable to tell).

The package inline available on CRAN, claims to allow inline C/C++ I'm not sure this differs from the built in functionality, aside for allowing the code to be inline w/R.

And, finally RSwig which appears to be in the wild but it is unclear how supported it is, as the author's page hasn't been updated for years.

My question is, what are the relative merits of these different approaches. Which are the most portable and robust, which are the easiest to implement. If you were planning to distribute a package on CRAN which of the methods would you use?


First off, a disclaimer: I use Rcpp all the time. In fact, when (having been renamed by the time from Rcpp) RcppTemplate had already been orphaned and without updates for two years, I started to maintain it under its initial name of Rcpp (under which it had been contributed to RQuantLib). That was about a year ago, and I have made a couple of incremental changes that you can find documented in the ChangeLog.

Now RcppTemplate has very recently come back after a full thirty-five months without any update or fix. It contains interesting new code, but it appears that it is not backwards compatible so I won't use it where I already used Rcpp.

Rcppbind was not very actively maintained whenever I checked. Whit Armstrong also has a templated interface package called rabstraction.

Inline is something completely different: it eases the compile / link cycle by 'embedding' your program as an R character string that then gets compiled, linked, and loaded. I have talked to Oleg about having inline support Rcpp which would be nice.

Swig is interesting too. Joe Wang did great work there and wrapped all of QuantLib for R. But when I last tried it, it no longer worked due to some changes in R internals. According to someone from the Swig team, Joe may still work on it. The goal of Swig is larger libraries anyway. This project could probably do with a revival but it is not without technical challenges.

Another mention should go to RInside which works with Rcpp and lets you embed R inside of C++ applications.

So to sum it up: Rcpp works well for me, especially for small exploratory projects where you just want to add a function or two. It's focus is ease of use, and it allows you to 'hide' some of the R internals that are not always fun to work with. I know of a number of other users whom I have helped on and and off via email. So I would say go for this one.

My 'Intro to HPC with R' tutorials have some examples of Rcpp, RInside and inline.

Edit: So let's look at a concrete example (taken from the 'HPC with R Intro' slides and borrowed from Stephen Milborrow who took it from Venables and Ripley). The task is to enumerate all possible combinations of the determinant of a 2x2 matrix containing only single digits in each position. This can be done in clever vectorised ways (as we discuss in the tutorial slides) or by brute force as follows:

#include <Rcpp.h>

RcppExport SEXP dd_rcpp(SEXP v) {
  SEXP  rl = R_NilValue;        // Use this when there is nothing to be returned.
  char* exceptionMesg = NULL;   // msg var in case of error

  try {
    RcppVector<int> vec(v);     // vec parameter viewed as vector of ints
    int n = vec.size(), i = 0;
    if (n != 10000) 
       throw std::length_error("Wrong vector size");
    for (int a = 0; a < 9; a++)
      for (int b = 0; b < 9; b++)
        for (int c = 0; c < 9; c++)
          for (int d = 0; d < 9; d++)
            vec(i++) = a*b - c*d;

    RcppResultSet rs;           // Build result set to be returned as list to R
    rs.add("vec", vec);         // vec as named element with name 'vec'
    rl = rs.getReturnList();    // Get the list to be returned to R.
  } catch(std::exception& ex) {
    exceptionMesg = copyMessageToR(ex.what());
  } catch(...) {
    exceptionMesg = copyMessageToR("unknown reason");
  }

  if (exceptionMesg != NULL) 
     Rf_error(exceptionMesg);

  return rl;
}

If you save this as, say, dd.rcpp.cpp and have Rcpp installed, then simply use

PKG_CPPFLAGS=`Rscript -e 'Rcpp:::CxxFlags()'`  \
    PKG_LIBS=`Rscript -e 'Rcpp:::LdFlags()'`  \
    R CMD SHLIB dd.rcpp.cpp

to build a shared library. We use Rscript (or r) to ask Rcpp about its header and library locations. Once built, we can load and use this from R as follows:

dyn.load("dd.rcpp.so")

dd.rcpp <- function() {
    x <- integer(10000)
    res <- .Call("dd_rcpp", x)
    tabulate(res$vec)
}

In the same way, you can send vectors, matrics, ... of various R and C++ data types back end forth with ease. Hope this helps somewhat.

Edit 2 (some five+ years later):

So this answer just got an upvote and hence bubbled up in my queue. A lot of time has passed since I wrote it, and Rcpp has gotten a lot richer in features. So I very quickly wrote this

#include <Rcpp.h>

// [[Rcpp::export]]
Rcpp::IntegerVector dd2(Rcpp::IntegerVector vec) {
    int n = vec.size(), i = 0;
    if (n != 10000) 
        throw std::length_error("Wrong vector size");
    for (int a = 0; a < 9; a++)
        for (int b = 0; b < 9; b++)
            for (int c = 0; c < 9; c++)
                for (int d = 0; d < 9; d++)
                    vec(i++) = a*b - c*d;
    return vec;
}

/*** R
x <- integer(10000)
tabulate( dd2(x) )
*/

which can be used as follows with the code in a file /tmp/dd.cpp

R> Rcpp::sourceCpp("/tmp/dd.cpp")    # on from any other file and path

R> x <- integer(10000)

R> tabulate( dd2(x) )
 [1]  87 132 105 155  93 158  91 161  72 104  45 147  41  96
[15]  72 120  36  90  32  87  67  42  26 120  41  36  27  75
[29]  20  62  16  69  19  28  49  45  12  18  11  57  14  48
[43]  10  18   7  12   6  46  23  10   4  10   4   6   3  38
[57]   2   4   2   3   2   2   1  17
R> 

Some of the key differences are:

  • simpler build: just sourceCpp() it; even executes R test code at the end
  • full-fledged IntegerVector type
  • exception-handling wrapper automatically added by sourceCpp() code generator
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜