开发者

Checking for defined values in subset()

Trying to get a subset of a data frame based on, to borrow from SQL, v开发者_开发百科alues that are not null. Trying something like:

lately <- subset(data, year > 1997 & myvalue != NA)

But that's not right. Any tips, r'sters?


subset(data, year > 1997 & !is.na(myvalue))

should do it. The reason your version doesn't work is that foo != NA or foo == NA is always NA because we don't know what the NA datum is. Use is.na() to test for NA and negate it using ! if you want "not NA".

E.g.:

> dat <- data.frame(year = 1995:2000, myvalue = c(1,3,4,NA,6,10))
> dat
  year myvalue
1 1995       1
2 1996       3
3 1997       4
4 1998      NA
5 1999       6
6 2000      10
> subset(dat, year > 1997 & myvalue != NA)
[1] year    myvalue
<0 rows> (or 0-length row.names)
> subset(dat, year > 1997 & !is.na(myvalue))
  year myvalue
5 1999       6
6 2000      10

It is instructive to ponder further on why your version doesn't work.

The first parts of the clause returns:

> with(dat, year > 1997)
[1] FALSE FALSE FALSE  TRUE  TRUE  TRUE

For the first 3 elements we don't need to do any further checking as they are FALSE, but we need to check the second clause for the final three elements in the example. The second clause returns NA for all elements, as discussed above:

> with(dat, myvalue != NA)
[1] NA NA NA NA NA NA

Hence the combined clause returns:

> with(dat, year > 1997 & myvalue != NA)
[1] FALSE FALSE FALSE    NA    NA    NA

which will end up not selecting any rows, and hence the zero-row object returned for your example.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜