
R: pmatch for a more difficult task

Thanks @nullglob,

I tried to run it again, but my output is different. Could you mind to teach me if I have misuse your code? Sorry that I may have misunderstand the way how it works. I hope you don't mind to give me some more advice.

 df1 <- data.frame(
    A=c("x01","x02","y03","z02","x04", "x33", "z03"),
    B=c("A01BB01","A02BB02","C02AA05","B04CC10","C01GX02", "yyy", "zzz"))

 df2 <- data.frame(
    X=c("a","b","c","d","e", "f"),
    Y=c("A01BB","A02","C02A","B04","C01GX", "xxx"))

   i <- pmatch(Y,B)
   iunmatched <- which(is.na(i))
   nunmatched <- length(iunmatched)
   nexcess <- length(B) - length(X)
   data.frame(A = c(A,rep(NA,nunmatched)),
              B = c(B,rep(NA,nunmatched)),
              X = c(X[i],rep(NA,nexcess),X[iunmatched]),
              Y = c(Y[i],rep(NA,nexcess),Y[iunmatched]))  })

       A  B  X  Y
    1  1  1  1  1
    2  2  2  2  2
    3  5  5  3  5
    4  6  3  4  3
    5  3  4  5  4
    6  4  6 NA NA
    7  7  7 NA NA
    8 NA NA  6  6

======================ORIGINAL Question=====

Thanks for answers to my previous question. (http://stackoverflow.com/q/6592214/602276)


To build upon this answer, I want to do the pmatch for a more difficult task.

df1 <- data.frame(
  A=c("x01","x02","y03","z02","x04", "x33", "z03")
  B=c("A01BB01","A02BB02","C02AA05","B04CC10","C01GX02", "yyy", "zzz")

    A       B
1 x01 A01BB01
2 x02 A02BB02
3 y03 C02AA05
4 z02 B04CC10
5 x04 C01GX02
6 x33     yyy
7 z03     zzz

My df2 is modified as follows:

df2 <- data.frame(
  X=c("a","b","c","d","e", "f"),
  Y=c("A01BB","A02","C02A","B04","C01GX", "xxx")

  X     Y
1 a A01BB
2 b   A02
3 c  C02A
4 d   B04
5 e C01GX
6 f   xxx

The difficulty is due to df1 and df2 has different no of rows, i cannot do cbind at the right beginning

Morover, there is some mismatch between df1 and df2, their corresponding line should results NA accordingly.

The expected output is as follows:

   A       B   X     Y
1 x01 A01BB01   a A01BB
2 x02 A02BB02   b   A02
3 y03 C02AA05   c  C02A
4 z02 B04CC10   d   B04
5 x04 C01GX02   e C01GX
6 x33     yyy   NA  NA
7 z03     zzz   NA  NA
7 NA      NA    f   xxx

Could you mind to teach me how to do it with R? Thanks a lot.

This is not exactly an elegant solution, but it seems to do the trick:

  i <- pmatch(Y,B)
  iunmatched <- which(is.na(i))
  nunmatched <- length(iunmatched)
  nexcess <- length(B) - length(X)
  data.frame(A = c(A,rep(NA,nunmatched)),
             B = c(B,rep(NA,nunmatched)),
             X = c(X[i],rep(NA,nexcess),X[iunmatched]),
             Y = c(Y[i],rep(NA,nexcess),Y[iunmatched]))

The output should be:

     A       B    X     Y
1  x01 A01BB01    a A01BB
2  x02 A02BB02    b   A02
3  y03 C02AA05    c  C02A
4  z02 B04CC10    d   B04
5  x04 C01GX02    e C01GX
6  x33     yyy <NA>  <NA>
7  z03     zzz <NA>  <NA>
8 <NA>    <NA>    f   xxx




验证码 换一张
取 消

