Recode/relevel data.frame factors with different levels
Each time when I have to recode some set of variables, I have SPSS recode function in mind. I must admit that it's quite straightforward. There's a similar recode
function in car
package, and it does the trick, but let's presuppose that I want to get things done with factor
.
I have data.frame
with several variables with value range from 1 to 7. I want to "reverse" variable values, hence replacing 1s with 7s, 2s with 6s, 3s with 5s etc. I can utilize factor
funct开发者_如何学Pythonion:
# create dummy factor
set.seed(100)
x <- as.factor(round(runif(100,1,7)))
y <- factor(x, levels = rev(levels(x)))
And if I run:
> levels(x)
[1] "1" "2" "3" "4" "5" "6" "7"
> levels(y)
[1] "7" "6" "5" "4" "3" "2" "1"
Problem starts when I want to recode factors that do not have equal levels. If some factor, z, has levels c("1", "3", "4", "6", "7")
, is there any chance that I can "reverse" levels so 1=7, 2=6, 3=5 etc. by utilizing factor
function?
Other efficient recode functions should suffice!
You must provide levels
argument to factor (as Dirk wrote):
set.seed(2342472)
( x <- round(runif(10,1,7)) )
# [1] 7 5 5 3 1 2 5 3 3 2
( xf <- as.factor(x) )
# [1] 7 5 5 3 1 2 5 3 3 2
# Levels: 1 2 3 5 7
( yf <- factor(x,levels=7:1) )
# [1] 7 5 5 3 1 2 5 3 3 2
# Levels: 7 6 5 4 3 2 1
you could do this on existing factor too
( yxf <- factor(xf,levels=7:1) )
# [1] 7 5 5 3 1 2 5 3 3 2
#Levels: 7 6 5 4 3 2 1
As you see levels were extended in desire order.
Yes, just assign to levels
:
R> set.seed(100)
R> x <- as.factor(round(runif(100,1,7)))
R> table(x)
x
1 2 3 4 5 6 7
3 16 20 19 18 17 7
R> levels(x) <- LETTERS[1:7]
R> table(x)
x
A B C D E F G
3 16 20 19 18 17 7
R>
If you complete the factor levels you're good to go:
df <- data.frame(x=factor(c(2,4,5,6)))
df$x <- factor(df$x, levels = 7:1)
table(df$x)
7 6 5 4 3 2 1
0 1 1 1 0 1 0
In this case, since you have numbers, why not just transform the numbers using modular arithmetic?
eg
levels(x) <- as.character((6*as.numeric(levels(x)))%%7+1)
Modify the 6 and 7 as appropriate if using larger ranges.
精彩评论