开发者

R: How to do fastest replacement in R?

I have a input dataframe like this (the real one is very large, so I want to do it faster):

df1 <- data.frame(A=c(1:5), B=c(5:9), C=c(9:13))

  A B  C
1 1 5  9
2 2 6 10
3 3 7 11
4 4 8 12
5 5 9 13

I have a dataframe with replacement code like this (the entries here maybe more than df1):

df2 <- data.frame(X=c(1:15), Y=c(101:115))

    X   Y
1   1 101
2   2 102
3   3 103
4   4 104
5   5 105
6   6 106
7   7 107
8   8 108
9   9 109
10 10 110
11 11 111
12 12 112
13 13 113
14 14 114
15 15 115

By matching df2$X with value in df开发者_运维技巧1$A and df1$B, I want to get a new_df1 by replace df1$A and df1$B with the corresponding values in df2$Y, i.e. resulting this new_df1

  A    B    C
1 101  105  9
2 102  106 10
3 103  107 11
4 104  108 12
5 105  109 13

Could you mind to give me some guidance how to do it faster in R, as my dataframe is very large? Many thanks.


As Thilo mentioned Nico's answer assumes that df2 is ordered by X and X contains every integer 1,2,3....

I would prefer to use match() as a more general case:

df1 <- data.frame(A=c(1:5), B=c(5:9), C=c(9:13))
df2 <- data.frame(X=c(1:15), Y=c(101:115))

new_df1 <- df1

new_df1$A <- df2$Y[match(df1$A,df2$X)]
new_df1$B <- df2$Y[match(df1$B,df2$X)]
    A   B  C
1 101 105  9
2 102 106 10
3 103 107 11
4 104 108 12
5 105 109 13


It's supereasy! You just need to get the proper offsets in the array.

So for instance, to get the Y column of df2 corresponding to the values in the A column of df1 you'll write df2$Y[df1$A]

Hence, your code will be:

df_new <- data.frame("A" = df2$Y[df1$A], "B" = df2$Y[df1$B], "C" = df1$C)


Here is another (one-liner) way of doing it.

> with(c(df2,df1),data.frame(A = Y[match(A,X)],B = Y[match(B,X)],C))
    A   B  C
1 101 105  9
2 102 106 10
3 103 107 11
4 104 108 12
5 105 109 13

However I am not sure whether it will be faster than the other suggestions

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜