Use merge() to update a data frame with values from a second data frame
I'm trying to figure out how to use merge()
to update a data frame.
Take for example the data frame foo
foo <- data.frame(index=c('a', 'b', 'c', 'd'), value=c(100, 101, NA, NA))
Which has the following values
index value
1 a 100
2 b 101
3 c NA
4 d NA
And the data frame bar
bar <- data.frame(index=c('c', 'd'), value=c(200, 201))
Which has the following values:
index value
1 c 200
2 d 201
W开发者_运维技巧hen I run the following merge()
function to update the values for c
and d
merge(foo, bar, by='index', all=T)
It results in this output:
index value.x value.y
1 a 100 NA
2 b 101 NA
3 c NA 200
4 d NA 201
I'd like the output of merge()
to avoid the creation of, in this specific example, of value.x
and value.y
but only retain the original column of value
Is there a simple way of doing this?
The optimal solution using data.table
library(data.table)
setDT(foo)
setDT(bar)
foo[bar, on="index", value:=i.value]
foo
# index value
#1: a 100
#2: b 101
#3: c 200
#4: d 201
first argument in [
data.table method is named i
thus we can refer to column from table in i
argument using i.
prefix.
Doesn't merge()
always bind columns together? Does replace()
work?
foo$value <- replace(foo$value, foo$index %in% bar$index, bar$value)
or match()
so the order matters
foo$value[match(bar$index, foo$index)] <- bar$value
I would also like to present an sql-solution using library sqldf and the R integrated sqlite-database. I like the simplicity, accuratness and power of sql.
Accurateness: since I can exactly define which object=rows I want to change without considering the order of a data.frame (foo.id = bar.id
).
Power: in WHERE after SET and WHERE (third row) I can define all conditions I want to consider to update.
Simplicity: the syntax is more readable than using index in vectors, matrix or dataframes.
library(sqldf)
# I changed index to id since index does not work.
# Obviously index is a key word in sqlite.
(foo <- data.frame(id=c('a', 'b', 'c', 'd'), value=c(100, 101, NA, NA)))
(bar <- data.frame(id=c('c', 'd'), value=c(200, 201)))
sqldf(c(paste("UPDATE foo"
," SET value = (SELECT bar.value FROM bar WHERE foo.id = bar.id)"
," WHERE value IS NULL"
)
, " SELECT * FROM main.foo"
)
)
Which gives
id value
1 a 100
2 b 101
3 c 200
4 d 201
Similar issues:
r equivalent of sql update?
R sqlite: update with two tables
merge()
only merges in new data. For instance, if you had a data set of average income for a few cities, and a separate data set of the populations of those cities, you would use merge()
to merge in one set of data into the other.
Like apeescape said, replace()
is probably what you want.
Another approach could be:
Remove the NAs from the first data fram
Use rbind to append the data instead of using merge:
These are the original two data frames:
foo <- data.frame(index=c('a', 'b', 'c', 'd'), value=c(100, 101, NA, NA))
bar <- data.frame(index=c('c', 'd'), value=c(200, 201))
(1) Use the negation of is.na to remove the NAs:
foo_new <- foo[!is.na(foo$value),]
(2) Bind the data frames and you'll get the answer you were looking for
new_df <- rbind(foo_new,bar)
new_df
index value
1 a 100
2 b 101
3 c 200
4 d 201
I think the most simple way is to "mark" the value which need to be update prior to the merge.
bar$update <- TRUE
foo <- merge(foo, bar, by='index', all=T, suffixes=c("",".update"))
foo[!is.na(foo$update),]$value <- foo[!is.na(foo$update),]$value.update
foo$value.update <- NULL
foo$update <- NULL
It would be faster using 'data.table'
library(data.table)
foo <- as.data.table(foo)
bar <- as.data.table(bar)
bar[, update:=TRUE]
foo <- merge(foo, bar, by='index', all=T, suffixes=c("",".update"))
foo[!is.na(update),value:=value.update]
foo[, c("value.update","update"):=NULL]
foo
index value
1: a 100
2: b 101
3: c 200
4: d 201
精彩评论