开发者

How to specify compare/key method for sorting method in R?

In Python you can use the key=... to specify the key used to compare items wh开发者_JAVA技巧en sorting. Is there a similar way to do this in R ?


Looking at these python key sort examples it seems that there are two thing that you might want a key for in R.

Firstly, applying a function to each element of the vector to be sorted.

x <- c("clementine", "APPLE", "Banana")

In R, you would just nest the function calls.

So rather than

sort(x, key = tolower)

you would just write

sort(tolower(x))

The other case is for sorting data frames by a particular column.

dfr <- data.frame(x = c(1, 4, 2, 5, 3), y = letters[c(5, 2, 1, 4, 3)])

Rather than

sort(dfr, key = function(row) row[2])

you would write

o <- with(dfr, order(y))
dfr[o,]


Let me extend the excellent answer of Richie.

If you want to get the order of any key, order is the function you're looking at. Building on Richie's example :

id <- order(tolower(x))
x[id] # gives you the original ordered vector

If you want specific keys, you have to take a look at ordered factors. Say you want to order observations following the series small, bigger, biggest.

We create a dataframe :

x <- data.frame(V1=1:10,
        V2=sample(c("small","bigger","biggest"),10,TRUE)
     )

Now you can order this using:

id <- order(ordered(x$V2,levels=c("small","bigger","biggest")))
x[id,]

the function ordered() makes the factor x$V2 an ordered factor according to the levels you specify. order() gives you the order of this ordered vector. That order you can use to sort the dataframe x.

If you want to sort first on V2 and then on V1, you can give multiple arguments to order as well :

id <- order(ordered(x$V2,levels=c("small","bigger","biggest")),x$V1)
x[id,]

Regarding your question: You don't need lambda expressions for that, as Richie showed. By the x[order(tolower(x))] you actually use something equivalent to sort(x, key=lambda x:tolower(x) ).

To give another example, say you have a list of vectors and you want to sort on the second value. You would use something like sort(x, key = lamda x:x[2] ) in python, right? In R you'd have to apply a function to your list, and use that in the order command:

x <- list(x1=1:10,x2=10:1,x3=rep(5,10))
id <- order(sapply(x,function(i)i[3]))
x[id]

General method

In R, you construct the key and use the order of that key as indices for the original object. The order function gives you an easy interface to sort on multiple keys at once. This allows you to construct the most complex sorting keys.


The sort function would return the vector elements by default in ascending order, but leaves out any NA's. The order function returns a numeric vector with unique elements specifying the positions of elements in ascending sequence, leaving the NA's at the end. Many times user choose the order function for "sorting" dataframes and vectors because the lengths are preserved.

 temp=sample(1:10, 15, replace=TRUE)
 temp[c(3,12)] <- NA
 sort(temp)
# [1]  2  3  3  4  6  7  7  7  8  9  9 10 10
 order(temp)
# [1] 15  2 14  4 13  7  8 10 11  1  6  5  9  3 12
 temp
# [1]  9  3 NA  4 10  9  7  7 10  7  8 NA  6  3  2
 temp[ order(temp) ]
# [1]  2  3  3  4  6  7  7  7  8  9  9 10 10 NA NA

To modify the default numeric or alphabetic collation order one would wrap functions around the argument inside order which may be a multiple level sort if there are multiple arguments passed.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜