How can I neatly clean my R workspace while preserving certain objects?
Suppose I'm messing about with some data by binding vectors together, as I'm wont to do on a lazy sunday afternoon.
x <- rnorm(25, mean = 65, sd = 10)
y <- rnorm(25, mean = 75, sd = 7)
z <- 1:25
dd <- data.frame(mscore = x, vscore = y, caseid = z)
I've now got my new dataframe dd
, which is wond开发者_如何学编程erful. But there's also still the detritus from my prior slicings and dicings:
> ls()
[1] "dd" "x" "y" "z"
What's a simple way to clean up my workspace if I no longer need my "source" columns, but I want to keep the dataframe? That is, now that I'm done manipulating data I'd like to just have dd
and none of the smaller variables that might inadvertently mask further analysis:
> ls()
[1] "dd"
I feel like the solution must be of the form rm(ls[ -(dd) ])
or something, but I can't quite figure out how to say "please clean up everything BUT the following objects."
I would approach this by making a separate environment in which to store all the junk variables, making your data frame using with()
, then copying the ones you want to keep into the main environment. This has the advantage of being tidy, but also keeping all your objects around in case you want to look at them again.
temp <- new.env()
with(temp, {
x <- rnorm(25, mean = 65, sd = 10)
y <- rnorm(25, mean = 75, sd = 7)
z <- 1:25
dd <- data.frame(mscore = x, vscore = y, caseid = z)
}
)
dd <- with(temp,dd)
This gives you:
> ls()
[1] "dd" "temp"
> with(temp,ls())
[1] "dd" "x" "y" "z"
and of course you can get rid of the junk environment if you really want to.
Here is an approach using setdiff
:
rm(list=setdiff(ls(), "dd"))
Since I forgot that comments don't support full formatting, I wanted to respond to Hadley's recommendation here. Some of my existing code--perhaps sloppily--tends to work like this:
caseid <- 1:25
height <- rnorm(25, mean = 150, sd = 15)
hd <- data.frame(caseid, height)
hd <- hd [-(7), ] # Removing a case
library(ggplot2)
qplot(x = caseid, y = height, data = hd) # Plots 25 points
In the above code, qplot()
will plot 25 points, and I think it's because my global variables caseid
and height
are masking its attempt to access them locally from the provided dataframe. So, the case that I removed still seems to get plotted, because it appears in the global variables, though not the dataframe hd
at the time of the qplot()
call.
My sense is that this behavior is entirely expected, and that the answer here is that I'm following a suboptimal coding practice. So, how can I start writing code that avoids these kinds of inadvertent collisions?
精彩评论