Replace all NA with FALSE in selected columns in R
I have a question similar to this one, but my dataset is a bit bigger: 50 columns with 开发者_运维百科1 column as UID and other columns carrying either TRUE
or NA
, I want to change all the NA
to FALSE
, but I don't want to use explicit loop.
Can plyr
do the trick? Thanks.
UPDATE #1
Thanks for quick reply, but what if my dataset is like below:
df <- data.frame(
id = c(rep(1:19),NA),
x1 = sample(c(NA,TRUE), 20, replace = TRUE),
x2 = sample(c(NA,TRUE), 20, replace = TRUE)
)
I only want X1
and X2
to be processed, how can this be done?
If you want to do the replacement for a subset of variables, you can still use the is.na(*) <-
trick, as follows:
df[c("x1", "x2")][is.na(df[c("x1", "x2")])] <- FALSE
IMO using temporary variables makes the logic easier to follow:
vars.to.replace <- c("x1", "x2")
df2 <- df[vars.to.replace]
df2[is.na(df2)] <- FALSE
df[vars.to.replace] <- df2
tidyr::replace_na
excellent function.
df %>%
replace_na(list(x1 = FALSE, x2 = FALSE))
This is such a great quick fix. the only trick is you make a list of the columns you want to change.
Try this code:
df <- data.frame(
id = c(rep(1:19), NA),
x1 = sample(c(NA, TRUE), 20, replace = TRUE),
x2 = sample(c(NA, TRUE), 20, replace = TRUE)
)
replace(df, is.na(df), FALSE)
UPDATED for an another solution.
df2 <- df <- data.frame(
id = c(rep(1:19), NA),
x1 = sample(c(NA, TRUE), 20, replace = TRUE),
x2 = sample(c(NA, TRUE), 20, replace = TRUE)
)
df2[names(df) == "id"] <- FALSE
df2[names(df) != "id"] <- TRUE
replace(df, is.na(df) & df2, FALSE)
You can use the NAToUnknown
function in the gdata
package
df[,c('x1', 'x2')] = gdata::NAToUnknown(df[,c('x1', 'x2')], unknown = 'FALSE')
With dplyr
you could also do
df %>% mutate_each(funs(replace(., is.na(.), F)), x1, x2)
It is a bit less readable compared to just using replace()
but more generic as it allows to select the columns to be transformed. This solution especially applies if you want to keep NAs in some columns but want to get rid of NAs in others.
An option would be to use a for
loop.
for(i in c("x1", "x2")) df[[i]][is.na(df[[i]])] <- FALSE
Benchmark
set.seed(42)
df <- data.frame(
id = c(rep(1:19),NA),
x1 = sample(c(NA,TRUE), 20, replace = TRUE),
x2 = sample(c(NA,TRUE), 20, replace = TRUE)
)
bench::mark(check=FALSE,
"Holger Brandl" = local(dplyr::mutate_each(df, dplyr::funs(replace(., is.na(.), F)), x1, x2)),
"mtelesha" = local(df <- tidyr::replace_na(df, list(x1 = FALSE, x2 = FALSE))),
Ramnath = local(df[,c('x1', 'x2')] <- gdata::NAToUnknown(df[,c('x1', 'x2')], unknown = 'FALSE')),
"Hong Ooi" = local(df[c("x1", "x2")][is.na(df[c("x1", "x2")])] <- FALSE),
GKi = local(for(i in c("x1", "x2")) df[[i]][is.na(df[[i]])] <- FALSE) )
# expression min median `itr/sec` mem_al…¹ gc/se…² n_itr n_gc total…³
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:by> <dbl> <int> <dbl> <bch:t>
#1 Holger Brandl 16.93ms 17.33ms 57.6 34.43KB 19.2 21 7 365ms
#2 mtelesha 3.94ms 4.39ms 226. 8.15KB 13.1 103 6 456ms
#3 Ramnath 400.28µs 415.44µs 2381. 1.55KB 16.7 1142 8 480ms
#4 Hong Ooi 196.87µs 206.72µs 4755. 488B 18.8 2276 9 479ms
#5 GKi 61.8µs 66.16µs 14808. 280B 20.9 7076 10 478ms
The for
-loop is about 3 times faster than Hong Ooi the second and uses the lowest amount of memory.
精彩评论