开发者

Concatenate expressions to subset a dataframe

I am attempting to create a function that will calculate the mean of a column in a subsetted dataframe. The trick here is that I always to want to have a couple subsetting conditions and then have the option to pass more conditions to the functions to further subset the dataframe.

Suppose my data look like this:

dat <- data.frame(var1 = rep(letters, 26), var2 = rep(letters, each = 26), var3 = runif(26^2))

head(dat)
  var1 var2      var3
1    a    a 0.7506109
2    b    a 0.7763748
3    c    a 0.6014976
4    d    a 0.6229010
5    e    a 0.5648263
6    f    a 0.5184999

I want to be able to do the subset shown below, using the first condition in all function calls, and the second be something that can change with each function call. Additionally, the second subsetting condition could be on other variables (I'm using a single variable, var2, for parsimony, but the condition could involve multiple variables).

subset(dat, var1 %in% c('a', 'b', 'c') & var2 %in% c('a', 'b'))
   var1 var2      var3
1     a    a 0.7506109
2     b    a 0.7763748
3     c    a 0.6014976
27    a    b 0.7322357
28    b    b 0.4593551
29    c    b 0.2951004

My example function and function call would look something like:

getMean <- function(expr) {  
  return(with(subset(dat, var1 %in% c('a', 'b', 'c') eval(expr)), mean(var3)))  
}
getMean(expression(& var2 %in% c('a', 'b')))

An alternative call could look like:

getMean(expression(& var4 < 6 & var5 > 10))

Any help is much appreciated.


EDIT: With Wojciech Sobala's help, I came up with the following function, which gives me the option of passing in 0 or more conditions.

getMean <- function(开发者_C百科expr = NULL) {
  sub <- if(is.null(expr)) { expression(var1 %in% c('a', 'b', 'c'))
  } else expression(var1 %in% c('a', 'b', 'c') & eval(expr))
  return(with(subset(dat, eval(sub)), mean(var3)))
}
getMean()
getMean(expression(var2 %in% c('a', 'b')))


It can be simplified with defalut expr=TRUE.

getMean <- function(expr = TRUE) {
  return(with(subset(dat, var1 %in% c('a', 'b', 'c') & eval(expr)), mean(var3)))
}


This is how I would approach it. The function getMean makes use of the R's handy default parameter settings:

getMean <- function(x, subset_var1, subset_var2=unique(x$var2)){
    xs <- subset(x, x$var1 %in% subset_var1 & x$var2 %in% subset_var2)

    mean(xs$var3)
}

getMean(dat, c('a', 'b', 'c'))
[1] 0.4762141

getMean(dat, c('a', 'b', 'c'), c('a', 'b'))
[1] 0.3814149
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜