Checking for defined values in subset()
Trying to get a subset of a data frame based on, to borrow from SQL, v开发者_开发百科alues that are not null. Trying something like:
lately <- subset(data, year > 1997 & myvalue != NA)
But that's not right. Any tips, r'sters?
subset(data, year > 1997 & !is.na(myvalue))
should do it. The reason your version doesn't work is that foo != NA
or foo == NA
is always NA
because we don't know what the NA
datum is. Use is.na()
to test for NA
and negate it using !
if you want "not NA".
E.g.:
> dat <- data.frame(year = 1995:2000, myvalue = c(1,3,4,NA,6,10))
> dat
year myvalue
1 1995 1
2 1996 3
3 1997 4
4 1998 NA
5 1999 6
6 2000 10
> subset(dat, year > 1997 & myvalue != NA)
[1] year myvalue
<0 rows> (or 0-length row.names)
> subset(dat, year > 1997 & !is.na(myvalue))
year myvalue
5 1999 6
6 2000 10
It is instructive to ponder further on why your version doesn't work.
The first parts of the clause returns:
> with(dat, year > 1997)
[1] FALSE FALSE FALSE TRUE TRUE TRUE
For the first 3 elements we don't need to do any further checking as they are FALSE, but we need to check the second clause for the final three elements in the example. The second clause returns NA
for all elements, as discussed above:
> with(dat, myvalue != NA)
[1] NA NA NA NA NA NA
Hence the combined clause returns:
> with(dat, year > 1997 & myvalue != NA)
[1] FALSE FALSE FALSE NA NA NA
which will end up not selecting any rows, and hence the zero-row object returned for your example.
精彩评论