How does R handle object in function call?
I have background of Java and Python and I'm learning R recently.
Today I found that R seems to handle objects quite differently from Java and Python.
For example, the following code:
x <- c(1:10)
print(x)
sapply(1:10,function(i){
开发者_开发百科x[i] = 4
})
print(x)
The code gives the following result:
[1] 1 2 3 4 5 6 7 8 9 10
[1] 1 2 3 4 5 6 7 8 9 10
But I expect the second line of output to be all '4' since I modified the vector in the sapply function.
So does this mean that R make copies of objects in function call instead of reference to the objects?
x
is defined in the global environment, not in your function.
If you try to modify a non-local object such as x
in a function then R makes a copy of the object and modifies the copy so each time you run your anonymous function a copy of x
is made and its ith component is set to 4. When the function exits the copy that was made disappears forever. The original x
is not modified.
If we were to write x[i] <<- i
or if we were to write x[i] <- 4; assign("x", x, .GlobalEnv)
then R would write it back. Another way to write it back would be to set e
, say, to the environment that x
is stored in and do this:
e <- environment()
sapply(1:10, function(i) e$x[i] <- 4)
or possibly this:
sapply(1:10, function(i, e) e$x[i] <- 4, e = environment())
Normally one does not write such code in R. Rather one produces the result as the output of the function like this:
x <- sapply(1:10, function(i) 4)
(Actually in this case one could write x[] <- 4
.)
ADDED:
Using the proto package one could do this where method f
sets the ith component of the x
property to 4.
library(proto)
p <- proto(x = 1:10, f = function(., i) .$x[i] <- 4)
for(i in seq_along(p$x)) p$f(i)
p$x
ADDED:
Added above another option in which we explicitly pass the environment that x
is stored in.
Yes, you're right. Check the R Language Definition: 4.3.3 Argument Evaluation
AFAIK, R doesn't really copy the data until you're trying to modify it, thus following the Copy-on-write semantics.
The x
that is inside the anonymous function is not the x
in the global environment (your workspace). It is a copy of x
, local to the anonymous function. It is not so simple to say that R copies objects in function calls; R will strive to not copy if it can, though once you modify something R has to copy the object.
As @DWin points out, this copied version of x
that has been modified is returned by the sapply()
call, your claimed output is not what I get:
> x <- c(1:10)
> print(x)
[1] 1 2 3 4 5 6 7 8 9 10
> sapply(1:10,function(i){
+ x[i] = 4
+ })
[1] 4 4 4 4 4 4 4 4 4 4
> print(x)
[1] 1 2 3 4 5 6 7 8 9 10
Clearly, the code did almost what you thought it would. The problem is that the output from sapply()
was not assigned to an object and hence is printed and thence discarded.
The reason you code even works is due to the scoping rules of R. You really should pass in to a function as arguments any objects that the function needs. However, if R can;t find an object local to the the function it will search the parent environment for an object matching the name, and then the parent of that environment if appropriate, eventually hitting the global environment, the work space. So your code works because it eventually found an x
to work with, but was immediately copied, that copy returned at the end of the sapply()
call.
This copying does take time and memory in many cases. This is one of the reasons people think for
loops are slow in R; they don't allocate storage for an object before filling it with a loop. If you don't allocate storage, R has to modify/copy the object to add the next result of the loop.
Again though, it isn't always that simple, everywhere in R, for example with environments, where a copy of an environment really just refers to the original version:
> a <- new.env()
> a
<environment: 0x1af2ee0>
> b <- 4
> assign("b", b, env = a)
> a$b
[1] 4
> c <- a ## copy the environment to `c`
> assign("b", 7, env = c) ## assign something to `b` in env `c`
> c$b ## as expected
[1] 7
> a$b ## also changed `b` in `a` as `a` and `c` are actually the same thing
[1] 7
If you understand these sorts of things, reading the R Language Definition manual which covers many of the details of what goes on under the hood in R.
You need to assign the output of sapply to an object, otherwise it just disappears. (Actually you can recover it since it also gets assigned to .Last.value
)
x <- c(1:10)
print(x)
[1] 1 2 3 4 5 6 7 8 9 10
x <- sapply(1:10,function(i){
x[i] = 4
})
print(x)
[1] 4 4 4 4 4 4 4 4 4 4
If you want to change a "global" object from within a function then you can use non-local assignment.
x <- c(1:10)
# [1] 1 2 3 4 5 6 7 8 9 10
print(x)
sapply(1:10,function(i){
x[i] <<- 4
})
print(x)
# [1] 4 4 4 4 4 4 4 4 4 4
Although in this particular case you could just have it more compactly as x[]<-4
That is, by the way, one of the nice features of R -- instead of sapply(1:10,function(i) x[i] <<- 4
or for(i in 1:10) x[i]<-4
(for
is not a function, so you don't need <<-
here) you can just write x[]<-4
:)
精彩评论