开发者

Subset dataframe by an unusual relation between columns

I want to subset a dataframe which has an ID column (v1, all unique) and a "linked" ID column (v2). The expectation of v2 is that it may contain NAs, but where it does, the corresponsing element of v1 does not appear elsewhere in v2. Also, it is expected that the relation between the columns is symmetric: where there is an entry, x, in v2 the v1 entry of that row, y, is mirrored in another row where v1 has x and v2 has y. The last criteria is that the relation is not reflexive: ie x!=y.

I want to subset the dataframe to the elements which don't fit the expected criteria.

Here is some sample data to illustrate:

set.seed(1)
dfr <- data.frame(v1=letters,v2=rev(letters))
dfr[sample(26,10),2]<-NA
开发者_C百科dfr[sample(26,5),2]<-sample(letters,5)


dfr
   v1   v2
1   a    z
2   b <NA>
3   c    x
4   d    w
5   e <NA>
6   f    u
7   g <NA>
8   h    s
9   i    i
10  j <NA>
11  k    p
12  l <NA>
13  m    f
14  n <NA>
15  o    l
16  p    k
17  q    j
18  r    e
19  s <NA>
20  t    g
21  u <NA>
22  v    e
23  w <NA>
24  x    q
25  y    x
26  z    a

So rows 1, 2, 11, 14, 16, and 26 all meet the criteria, and I want to identify the rest.

I have attempted some solutions using match, but the NAs are causing problems. It also probably relies on the fact that in this case v2 is based on rev(v1), whereas a more general solution can't make that assumption.


If I correctly understand, here is an example:

> subset(dfr, (is.na(v2) & !(v1%in%dfr$v2)) | !is.na(v2) & paste(v1, v2) %in% paste(dfr$v2, dfr$v1))
   v1   v2
1   a    z
2   b <NA>
9   i    i
11  k    p
14  n <NA>
16  p    k
26  z    a

# or if v1 == v2 is not included:
> subset(dfr, (is.na(v2) & !(v1%in%dfr$v2)) | !is.na(v2) & (v1 != v2 & paste(v1, v2) %in% paste(dfr$v2, dfr$v1)))
   v1   v2
1   a    z
2   b <NA>
11  k    p
14  n <NA>
16  p    k
26  z    a
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜