Checking for defined values in subset()

2023-03-18 18:05 问答作者：

Trying to get a subset of a data frame based on, to borrow from SQL, v开发者_开发百科alues that are not null. Trying something like:

lately <- subset(data, year > 1997 & myvalue != NA)

But that's not right. Any tips, r'sters?

subset(data, year > 1997 & !is.na(myvalue))

should do it. The reason your version doesn't work is that foo != NA or foo == NA is always NA because we don't know what the NA datum is. Use is.na() to test for NA and negate it using ! if you want "not NA".

E.g.:

> dat <- data.frame(year = 1995:2000, myvalue = c(1,3,4,NA,6,10))
> dat
  year myvalue
1 1995       1
2 1996       3
3 1997       4
4 1998      NA
5 1999       6
6 2000      10
> subset(dat, year > 1997 & myvalue != NA)
[1] year    myvalue
<0 rows> (or 0-length row.names)
> subset(dat, year > 1997 & !is.na(myvalue))
  year myvalue
5 1999       6
6 2000      10

It is instructive to ponder further on why your version doesn't work.

The first parts of the clause returns:

> with(dat, year > 1997)
[1] FALSE FALSE FALSE  TRUE  TRUE  TRUE

For the first 3 elements we don't need to do any further checking as they are FALSE, but we need to check the second clause for the final three elements in the example. The second clause returns NA for all elements, as discussed above:

> with(dat, myvalue != NA)
[1] NA NA NA NA NA NA

Hence the combined clause returns:

> with(dat, year > 1997 & myvalue != NA)
[1] FALSE FALSE FALSE    NA    NA    NA

which will end up not selecting any rows, and hence the zero-row object returned for your example.

Checking for defined values in subset()

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？