开发者

R - check consistency of group assignment, group labels with different names

I am trying to assign sub-group membership in 4 independent cancer gene expression datasets, training on each dataset in turn, followed by testing the (metagene based) assignment in the remaining three, plus testing on the training cohort itself.

This produces group memberships for each sample, for each comparison and you can gain an idea about sample stability (does a given sample cluster within the same cluster each time?) The problem is that the group labels can differ from comparison to comparison, so comparing against group labels doesn't work.

In order to assess sample stability, I think I will need, for each sample, to catalogue its fellow subgroup memb开发者_如何学Cers, but I haven't been able to conceptualise how precisely I should do this.

For what its worth, the code below should demonstrate the problem a little more clearly than I have described above.

Thanks for reading, and any help is appreciated!

## Here we have 12 samples (A-L), all of which have congruent assignments, except sample K.
## From the two group assignments, we can see that group 1 has become group 4 in class2,
## group 2 has become group 1 etc. etc.

## How do we assess cluster membership with these differing subgroup labels?

class1<-c(1,2,3,4,1,2,3,4,1,2,3,4)
class2<-c(4,1,2,3,4,1,2,3,4,1,3,3)

names(class1)<-LETTERS[1:12]
names(class2)<-LETTERS[1:12]


Try matchClasses in e1071, or some of the methods in the seriation package might help. You need the full two way table of classifications though.


Nice question. Thank you for framing the question so clearly. I am working on clustering myself at the moment, and parked this question for solving later.

Here is a graphical way of solving the problem.

library(ggplot2)
# Create dummy data
# In the first instance, there is perfect transposition between A and D
d <- data.frame(
    clust1 = LETTERS[rep(1:4, 3)],
    clust2 = LETTERS[rep(c(4,1,2,3), 3)]
)
ggplot(d, aes(x=clust1, y=clust2)) + geom_point(stat="sum", aes(size=..n..))

R - check consistency of group assignment, group labels with different names

# Now modify data so that there is a single instance of imperfect matching
d$clust2[1] <- "A"
ggplot(d, aes(x=clust1, y=clust2)) + geom_point(stat="sum", aes(size=..n..))

R - check consistency of group assignment, group labels with different names

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜