Remove NA values from a vector

2023-04-12 00:24 问答作者：

I have a huge vector which has a couple of NA values, and I'm trying to find the max value in that vector (t开发者_StackOverflow中文版he vector is all numbers), but I can't do this because of the NA values.

How can I remove the NA values so that I can compute the max?

Trying ?max, you'll see that it actually has a na.rm = argument, set by default to FALSE. (That's the common default for many other R functions, including sum(), mean(), etc.)

Setting na.rm=TRUE does just what you're asking for:

d <- c(1, 100, NA, 10)
max(d, na.rm=TRUE)

If you do want to remove all of the NAs, use this idiom instead:

d <- d[!is.na(d)]

A final note: Other functions (e.g. table(), lm(), and sort()) have NA-related arguments that use different names (and offer different options). So if NA's cause you problems in a function call, it's worth checking for a built-in solution among the function's arguments. I've found there's usually one already there.

The na.omit function is what a lot of the regression routines use internally:

vec <- 1:1000
vec[runif(200, 1, 1000)] <- NA
max(vec)
#[1] NA
max( na.omit(vec) )
#[1] 1000

Use discard from purrr (works with lists and vectors).

discard(v, is.na)

The benefit is that it is easy to use pipes; alternatively use the built-in subsetting function [:

v %>% discard(is.na)
v %>% `[`(!is.na(.))

Note that na.omit does not work on lists:

> x <- list(a=1, b=2, c=NA)
> na.omit(x)
$a
[1] 1

$b
[1] 2

$c
[1] NA

?max shows you that there is an extra parameter na.rm that you can set to TRUE.

Apart from that, if you really want to remove the NAs, just use something like:

myvec[!is.na(myvec)]

Just in case someone new to R wants a simplified answer to the original question

How can I remove NA values from a vector?

Here it is:

Assume you have a vector foo as follows:

foo = c(1:10, NA, 20:30)

running length(foo) gives 22.

nona_foo = foo[!is.na(foo)]

length(nona_foo) is 21, because the NA values have been removed.

Remember is.na(foo) returns a boolean matrix, so indexing foo with the opposite of this value will give you all the elements which are not NA.

You can call max(vector, na.rm = TRUE). More generally, you can use the na.omit() function.

I ran a quick benchmark comparing the two base approaches and it turns out that x[!is.na(x)] is faster than na.omit. User qwr suggested I try purrr::dicard also - this turned out to be massively slower (though I'll happily take comments on my implementation & test!)

microbenchmark::microbenchmark(
  purrr::map(airquality,function(x) {x[!is.na(x)]}), 
  purrr::map(airquality,na.omit),
  purrr::map(airquality, ~purrr::discard(.x, .p = is.na)),
  times = 1e6)

Unit: microseconds
                                                     expr    min     lq      mean median      uq       max neval cld
 purrr::map(airquality, function(x) {     x[!is.na(x)] })   66.8   75.9  130.5643   86.2  131.80  541125.5 1e+06 a  
                          purrr::map(airquality, na.omit)   95.7  107.4  185.5108  129.3  190.50  534795.5 1e+06  b 
  purrr::map(airquality, ~purrr::discard(.x, .p = is.na)) 3391.7 3648.6 5615.8965 4079.7 6486.45 1121975.4 1e+06   c

For reference, here's the original test of x[!is.na(x)] vs na.omit:

microbenchmark::microbenchmark(
    purrr::map(airquality,function(x) {x[!is.na(x)]}), 
    purrr::map(airquality,na.omit), 
    times = 1000000)


Unit: microseconds
                                              expr  min   lq      mean median    uq      max neval cld
 map(airquality, function(x) {     x[!is.na(x)] }) 53.0 56.6  86.48231   58.1  64.8 414195.2 1e+06  a 
                          map(airquality, na.omit) 85.3 90.4 134.49964   92.5 104.9 348352.8 1e+06   b

Another option using complete.cases like this:

d <- c(1, 100, NA, 10)
result <- complete.cases(d)
output <- d[result]
output
#> [1]   1 100  10
max(output)
#> [1] 100

^{Created on 2022-08-26 with reprex v2.0.2}

Remove NA values from a vector

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？