R: How to do fastest replacement in R?
I have a input dataframe like this (the real one is very large, so I want to do it faster):
df1 <- data.frame(A=c(1:5), B=c(5:9), C=c(9:13))
A B C
1 1 5 9
2 2 6 10
3 3 7 11
4 4 8 12
5 5 9 13
I have a dataframe with replacement code like this (the entries here maybe more than df1):
df2 <- data.frame(X=c(1:15), Y=c(101:115))
X Y
1 1 101
2 2 102
3 3 103
4 4 104
5 5 105
6 6 106
7 7 107
8 8 108
9 9 109
10 10 110
11 11 111
12 12 112
13 13 113
14 14 114
15 15 115
By matching df2$X with value in df开发者_运维技巧1$A and df1$B, I want to get a new_df1 by replace df1$A and df1$B with the corresponding values in df2$Y, i.e. resulting this new_df1
A B C
1 101 105 9
2 102 106 10
3 103 107 11
4 104 108 12
5 105 109 13
Could you mind to give me some guidance how to do it faster in R, as my dataframe is very large? Many thanks.
As Thilo mentioned Nico's answer assumes that df2
is ordered by X
and X
contains every integer 1,2,3....
I would prefer to use match()
as a more general case:
df1 <- data.frame(A=c(1:5), B=c(5:9), C=c(9:13))
df2 <- data.frame(X=c(1:15), Y=c(101:115))
new_df1 <- df1
new_df1$A <- df2$Y[match(df1$A,df2$X)]
new_df1$B <- df2$Y[match(df1$B,df2$X)]
A B C
1 101 105 9
2 102 106 10
3 103 107 11
4 104 108 12
5 105 109 13
It's supereasy! You just need to get the proper offsets in the array.
So for instance, to get the Y column of df2
corresponding to the values in the A column of df1
you'll write df2$Y[df1$A]
Hence, your code will be:
df_new <- data.frame("A" = df2$Y[df1$A], "B" = df2$Y[df1$B], "C" = df1$C)
Here is another (one-liner) way of doing it.
> with(c(df2,df1),data.frame(A = Y[match(A,X)],B = Y[match(B,X)],C))
A B C
1 101 105 9
2 102 106 10
3 103 107 11
4 104 108 12
5 105 109 13
However I am not sure whether it will be faster than the other suggestions
精彩评论