开发者

R (statistical) scoping error using transformBy(), part of the doBy package

I think I'm getting a scoping error when using transformBy(), part of the doBy package for R. Here is a simple example of the problem:

> library(doBy)
>
> test.data = data.frame(
+  herp = c(1,2,3,4,5),
+  derp = c(2,3,1,3,5)
+ )
>
> transformData = function(data){
+ 
+  five = 5
+ 
+  transformBy(
+   ~ herp,
+   data=data,
+   sum=herp + derp + five
+  )
+ }
>
> transformData(test.data)
Error in eval(expr, envir, enclos) : object 'five' not found

When I run transformBy() within a sub-scope (non开发者_StackOverflow-global scope) no local variables or functions seem to be available for use in transformBy. If, on the other hand, I define those variables or functions globally, they become available. Here is a slightly modified example that works:

> library(doBy)
>
> test.data = data.frame(
+  herp = c(1,2,3,4,5),
+  derp = c(2,3,1,3,5)
+ )
>
> five = 5
>
> transformData = function(data){
+  transformBy(
+   ~ herp,
+   data=data,
+   sum=herp + derp + five
+  )
+ }
>
> transformData(test.data)
  herp derp sum
1    1    2   8
2    2    3  10
3    3    1   9
4    4    3  12
5    5    5  15

Am I misunderstanding something about how transformBy is supposed to work or is something broken?

Versions:

  • ubuntu: 8.04 (x64)
  • R: 2.10.1
  • doBy: 4.0.5


This is clearly documented in the ?transformBy help page, and therefore not a bug.

Details:

 The ... arguments are tagged vector expressions, which are
 evaluated in the data frame data. The tags are matched against
 names(data), and for those that match, the value replace the
 corresponding variable in data, and the others are appended to
 data.

Simply make the object "five" a part of the data.frame "data", and it will work as you expect. Currently the function is trying to evaluate "five" in the "data" data.frame, which fails of course.


I would consider this a bug in the transformBy function. If you look at the source of transformBy, it creates a subfunction called transform2 which evaluates the last argument first in the context of the dataframe, with the parent.frame() as the enclosing environment. It then calls lapply on transform2.

Since R uses lexical scoping semantics (see http://cran.r-project.org/doc/manuals/R-intro.html#Scope), the effective scope hierarchy is data then lapply then global. I think the right fix is to add a statement of the form pf <- parent.frame() outside of the transform2 definition and then reference pf in the eval statement.


This is a problem that is raised in a variety of ways. Apparently, something strange is going on with the scoping in R.

edit :It's not the scoping in R that is working different from what I naively expected, but the one of transformBy(). See the answer of Erik.

I get around it by assigning a temporary environment in the global environment, something like :

transformData = function(data){

  temp_env <<- new.env(hash=T) #hashed environment for easy access
  temp_env$five = 5

  out <- transformBy(
   ~ herp,
   data=data,
   sum=herp + derp + temp_env$five
  )
  rm(temp_env,envir=.GlobalEnv) # cleanup 
  return(out)
 }
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜