How to write an R function that evaluates an expression within a data-frame
Puzzle for the R cognoscenti: Say we have a data-frame:
df <- data.frame( a = 1:5, b = 1:5 )
I know we can do things like
with(df, a)
to get a vector of results.
But how do I write a function that takes an expression (such as a
or a > 3
) and does the same thing inside. I.e. I want to write a function fn
that takes a data-frame and an expression as arguments and returns the result of evaluating the expression "within" the data-frame as an environment.
Never mind that this sounds contrived (I could just use with
as above), but this is just a simplified version of a more complex function I am writing. I tried several variants ( using eval
, with
, envir
, substitute
, local
, etc) but none of them work. For example if I define fn
like so:
fn <- functio开发者_如何学Gon(dat, expr) {
eval(expr, envir = dat)
}
I get this error:
> fn( df, a )
Error in eval(expr, envir = dat) : object 'a' not found
Clearly I am missing something subtle about environments and evaluation. Is there a way to define such a function?
The lattice package does this sort of thing in a different way. See, e.g., lattice:::xyplot.formula
.
fn <- function(dat, expr) {
eval(substitute(expr), dat)
}
fn(df, a) # 1 2 3 4 5
fn(df, 2 * a + b) # 3 6 9 12 15
That's because you're not passing an expression.
Try:
fn <- function(dat, expr) {
mf <- match.call() # makes expr an expression that can be evaluated
eval(mf$expr, envir = dat)
}
> df <- data.frame( a = 1:5, b = 1:5 )
> fn( df, a )
[1] 1 2 3 4 5
> fn( df, a+b )
[1] 2 4 6 8 10
A quick glance at the source code of functions using this (eg lm
) can reveal a lot more interesting things about it.
A late entry, but the data.table
approach and syntax would appear to be what you are after.
This is exactly how [.data.table
works with the j
, i
and by
arguments.
If you need it in the form fn(x,expr)
, then you can use the following
library(data.table)
DT <- data.table(a = 1:5, b = 2:6)
`[`(x=DT, j=a)
## [1] 1 2 3 4 5
`[`(x=DT, j=a * b)
## [1] 2 6 12 20 30
I think it is easier to use in more native form
DT[,a]
## [1] 1 2 3 4 5
and so on. In the background this is using substitute
and eval
?within might also be of interest.
df <- data.frame( a = 1:5, b = 1:5 )
within(df, cx <- a > 3)
a b cx
1 1 1 FALSE
2 2 2 FALSE
3 3 3 FALSE
4 4 4 TRUE
5 5 5 TRUE
精彩评论