R sorts a vector on its own accord
df.sorted <- c("binned_walker1_1.grd", "binned_walker1_2.grd", "binned_walker1_3.grd",
"binned_walker1_4.grd", "binned_walker1_5.grd", "binned_walker1_6.grd",
"binned_walker2_1.grd", "binned_walker2_2.grd", "binned_walker3_1.grd",
"binned_walker3_2.grd", "binned_walker3_3.grd", "binned_walker3_4.grd",
"binned_walker3_5.grd", "binned_walker4_1.grd", "binned_walker4_2.grd",
"binned_walker4_3.grd", "binned_walker4_4.grd", "binned_walker4_5.grd",
"binned_walker5_1.grd", "binned_walker5_2.grd", "binned_walker5_3.grd",
"binned_walker5_4.grd", "binned_walker5_5.grd", "binned_walker5_6.grd",
"binned_walker6_1.grd", "binned_walker7_1.grd", "binned_walker7_2.grd",
"binned_walker7_3.grd", "binned_walker7_4.grd", "binned_walker7_5.grd",
"binned_walker8_1.grd", "binned_walker8_2.grd", "binned_walker9_1.grd",
"binned_walker9_2.grd", "binned_walker9_3.grd", "binned_walker9_4.grd",
"binned_walker10_1.grd", "binned_walker10_2.grd", "binned_walker10_3.grd")
One would expect that order of this vector would be 1:length(df.sorted)
, but that appears not to be the case. It looks like R internally sorts the vector according to its logic but tries really hard to display it the wa开发者_Go百科y it was created (and is seen in the output).
order(df.sorted)
[1] 37 38 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
[26] 23 24 25 26 27 28 29 30 31 32 33 34 35 36
Is there a way to "reset" the ordering to 1:length(df.sorted)
? That way, ordering, and the output of the vector would be in sync.
Use the mixedsort
(or) mixedorder
functions in package gtools:
require(gtools)
mixedorder(df.sorted)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
[28] 28 29 30 31 32 33 34 35 36 37 38 39
construct it as an ordered factor:
> df.new <- ordered(df.sorted,levels=df.sorted)
> order(df.new)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ...
EDIT :
After @DWins comment, I want to add that it is even not nessecary to make it an ordered factor, just a factor is enough if you give the right order of levels :
> df.new2 <- factor(df.sorted,levels=df.sorted)
> order(df.new)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ...
The difference will be noticeable when you use those factors in a regression analysis, they can be treated differently. The advantage of ordered factors is that they let you use comparison operators as < and >. This makes life sometimes a lot easier.
> df.new2[5] < df.new2[10]
[1] NA
Warning message:
In Ops.factor(df.new[5], df.new[10]) : < not meaningful for factors
> df.new[5] < df.new[10]
[1] TRUE
Isn't this simply the same thing you get with all lexicographic shorts (as e.g. ls
on directories) where walker10_foo sorts
higher than walker1_foo
?
The easiest way around, in my book, is to use a consistent number of digits, i.e. I would change to binned_walker01_1.grd
and so on inserting a 0 for the one-digit counts.
In response to Dwin's comment on Dirk's answer: the data are always putty in your hands. "This is R. There is no if. Only how." -- Simon Blomberg
You can add 0
like so:
df.sorted <- gsub("(walker)([[:digit:]]{1}_)", "\\10\\2", df.sorted)
If you needed to add 00
, you do it like this:
df.sorted <- gsub("(walker)([[:digit:]]{1}_)", "\\10\\2", df.sorted)
df.sorted <- gsub("(walker)([[:digit:]]{2}_)", "\\10\\2", df.sorted)
...and so on.
精彩评论