Subsetting data frame using variable with same name as column
I have a data frame and I'开发者_如何学Pythonm trying to run a subset on it. In my data frame, I have a column called "start" and I'm trying to do this:
sub <- subset(data,data$start==14)
and I correctly get a subset of all the rows where start=14.
But, when I do this:
for(start in seq(1,20,by=1)) {
sub <- subset(data,data$start==start)
print(sub)
}
it does not correctly find the subsets. It just prints the entire data frame.
Why is this and how do I fix it?
You can also specify the environment you're working with:
x<-data.frame(
start=sample(3,20,replace=TRUE),
someValue=runif(20))
env<-environment()
start<-3
cat("\nDefaut scope:")
print(subset(x,start==start)) # all entries, as start==start is evaluated to TRUE
cat("\nSpecific environment:")
print(subset(x,start==get('start',env))) # second start is replaced by its value in former environment. Equivalent to subset(x,start==3)
Fixing it is easy. Just rename either your for
loop counter or your data frame column to something other than start
.
The reason it happens is because subset
is trying to evaluate the expression data$start == start
inside the data frame data
. So it sees the column start
and stops there, never seeing the other variable start
you defined in the for
loop.
Perhaps a better insight into why R gets confused here is to note that when using subset
you don't in general need to refer to variables using data$
. So imagine telling R:
subset(data,start == start)
R is just going to evaluate both of those start
's inside data
and get a vector of all TRUE
's back.
Another approach is to use bracket subsetting rather than the subset
function.
for(start in seq(1,20,by=1)) {
sub <- data[data$start==start,]
print(sub)
}
subset
has non-standard evaluation rules, which is leading to the scoping problem you are seeing (to which start
are you referring?). If there are (or may be) NA
's in data$start
, you probably need
sub <- data[!is.na(data$start) & data$start==start,]
Note this warning from the subset
help page:
This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.
精彩评论