Set ordering of factor levels for multiple columns in a data frame
I've loaded data from a CSV file into a data frame. Each column represents a survey question, and all of the answers are on a five-point Likert scale, with the labels: ("None", "Low", "Medium", "High", "Very High").
When I read in the data initially, R correctly interprets those values as factors but doesn't know what the ordering shou开发者_运维问答ld be. I want to specify what the ordering is for the values so I can do some numerical calculations. I thought the following code would work:
X <- read.csv('..')
likerts <- data.frame(apply(X, 2, function(X){factor(X,
levels = c("None", "Low", "Medium", "High", "Very High"),
ordered = T)}))
What happens instead is that all of the level data gets converted into strings. How do I do this correctly?
When using data.frame
, R will convert again to a normal factor (or if stringsAsFactors = FALSE
to string). Use as.data.frame
instead. A trivial example with a toy data-frame:
X <- data.frame(
var1=rep(letters[1:5],3),
var2=rep(letters[1:5],each=3)
)
likerts <- as.data.frame(lapply(X, function(X){ordered(X,
levels = letters[5:1],labels=letters[5:1])}))
> str(likerts)
'data.frame': 15 obs. of 2 variables:
$ var1: Ord.factor w/ 5 levels "e"<"d"<"c"<"b"<..: 5 4 3 2 1 5 4 3 2 1 ...
$ var2: Ord.factor w/ 5 levels "e"<"d"<"c"<"b"<..: 5 5 5 4 4 4 3 3 3 2 ...
On a sidenote, ordered()
gives you an ordered factor, and lapply(X,...)
is more optimal than apply(X,2,...)
in case of dataframes.
And the obligatory plyr
solution (using Joris's example above):
> require(plyr)
> Y <- catcolwise( function(v) ordered(v, levels = letters[5:1]))(X)
> str(Y)
'data.frame': 15 obs. of 2 variables:
$ var1: Ord.factor w/ 5 levels "e"<"d"<"c"<"b"<..: 5 4 3 2 1 5 4 3 2 1 ...
$ var2: Ord.factor w/ 5 levels "e"<"d"<"c"<"b"<..: 5 5 5 4 4 4 3 3 3 2 ...
Note that one good thing about catcolwise
is that it will only apply it to the columns of X that are factors, leaving the others alone. To explain what is going on: catcolwise
is a function that takes a function as an argument, and returns a function that operates "columnwise" on the factor-columns of the data-frame. So we can imagine the above line in two stages: fn <- catcolwise(...); Y <- fn(X)
. Note that there are also functions colwise
(operates on all columns) and numcolwise
(operate only on numerical columns).
精彩评论