R (statistical) scoping error using transformBy(), part of the doBy package
I think I'm getting a scoping error when using transformBy(), part of the doBy package for R. Here is a simple example of the problem:
> library(doBy)
>
> test.data = data.frame(
+ herp = c(1,2,3,4,5),
+ derp = c(2,3,1,3,5)
+ )
>
> transformData = function(data){
+
+ five = 5
+
+ transformBy(
+ ~ herp,
+ data=data,
+ sum=herp + derp + five
+ )
+ }
>
> transformData(test.data)
Error in eval(expr, envir, enclos) : object 'five' not found
When I run transformBy() within a sub-scope (non开发者_StackOverflow-global scope) no local variables or functions seem to be available for use in transformBy. If, on the other hand, I define those variables or functions globally, they become available. Here is a slightly modified example that works:
> library(doBy)
>
> test.data = data.frame(
+ herp = c(1,2,3,4,5),
+ derp = c(2,3,1,3,5)
+ )
>
> five = 5
>
> transformData = function(data){
+ transformBy(
+ ~ herp,
+ data=data,
+ sum=herp + derp + five
+ )
+ }
>
> transformData(test.data)
herp derp sum
1 1 2 8
2 2 3 10
3 3 1 9
4 4 3 12
5 5 5 15
Am I misunderstanding something about how transformBy is supposed to work or is something broken?
Versions:
- ubuntu: 8.04 (x64)
- R: 2.10.1
- doBy: 4.0.5
This is clearly documented in the ?transformBy help page, and therefore not a bug.
Details:
The ... arguments are tagged vector expressions, which are
evaluated in the data frame data. The tags are matched against
names(data), and for those that match, the value replace the
corresponding variable in data, and the others are appended to
data.
Simply make the object "five" a part of the data.frame "data", and it will work as you expect. Currently the function is trying to evaluate "five" in the "data" data.frame, which fails of course.
I would consider this a bug in the transformBy
function. If you look at the source of transformBy
, it creates a subfunction called transform2
which evaluates the last argument first in the context of the dataframe, with the parent.frame() as the enclosing environment. It then calls lapply
on transform2
.
Since R uses lexical scoping semantics (see http://cran.r-project.org/doc/manuals/R-intro.html#Scope), the effective scope hierarchy is data
then lapply
then global. I think the right fix is to add a statement of the form pf <- parent.frame()
outside of the transform2
definition and then reference pf
in the eval
statement.
This is a problem that is raised in a variety of ways. Apparently, something strange is going on with the scoping in R.
edit :It's not the scoping in R that is working different from what I naively expected, but the one of transformBy(). See the answer of Erik.
I get around it by assigning a temporary environment in the global environment, something like :
transformData = function(data){
temp_env <<- new.env(hash=T) #hashed environment for easy access
temp_env$five = 5
out <- transformBy(
~ herp,
data=data,
sum=herp + derp + temp_env$five
)
rm(temp_env,envir=.GlobalEnv) # cleanup
return(out)
}
精彩评论