开发者

R: Sort multiple columns by another data.frame?

I'm trying to make sense of how to sort one data.frame based on multiple columns in another. This question does this with vectors. Can someone suggest a way to do the equivalent with data.frames?

Here's some sample data.

x1 <- data.frame(a=1:5, b=letters[1:5], c=rnorm(5))
x2 <- data.frame(a=c(4,4,2), b=c("d", "d", "b"), d=rnorm(3))

So I want to sort x2 by the first two columns of x1. My actual data is much more complicat开发者_开发问答ed, but this replicates the idea...


It really depends on what your data really looks like. As it looks right now, you only need one column to sort, and that is easily done by:

x2[order(match(x2[,1],x1[,1])),]

If you need more than one column, this becomes a bit trickier. You will have to specify which one you want to sort first on, and which one second, eg :

x1 <- data.frame(a=rep(1:3,2), b=rep(letters[2:4],each=2), c=rnorm(6))
x2 <- data.frame(a=c(3,3,2), b=c("c", "d", "b"), d=rnorm(3))


x2[order(match(
  paste(x2[,1],x2[,2]),
  paste(x1[,1],x1[,2]))
),]

This sorts on the first column first, and then on the second. You have to keep in mind that you need all combinations in x2 also in x1. T


Attach a rank column to the relevant columns of x1:

len <- dim(x1)[1]
x1. <- cbind(x1[,1:2], rank=1:len)

Merge into x2 (this is like a SQL join; see the merge documentation for how to specify what happens if there are ambiguities such as multiple matches or no matches):

x2. <- merge(x2, x1.)

Sort:

x2.[order(x2.[,'rank']),]


This can be done precisely using plyr. Joris' answer would work fairly well, but potentially could missort when combining strings:

> paste ("A A","B")
[1] "A A B"
> paste ("A","A B")
[1] "A A B"

You can get an exact answer using join.keys and match:

x1 <- data.frame(a=rep(1:3,2), b=rep(letters[2:4],each=2), c=rnorm(6))
x2 <- data.frame(a=c(3,3,2), b=c("c", "d", "b"), d=rnorm(3))

library(plyr)
keys<-join.keys(x1,x2,c("a","b"))
matches<-match(keys$y,keys$x,nomatch=(keys$n+1))
x2[order(matches),]

This should handle most edge cases, mismatched list sizes, etc. Items without a match in both the index columns are put at the end of the list.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜