apply treats numbers as characters
I couldn't find a solution for this problem online, as simple as it seems. Here's it is:
#Construct test dataframe
tf <- data.frame(1:3,4:6,c("A","A","A"))
#Try the apply function I'm trying to use
test <- apply(tf,2,function(x) if(is.numeric(x)) mean(x) else unique(x)[1])
#Look at the output--all columns treated as character columns...
test
#Look at the format of the original data--the first two columns are integers.
str(tf)
In general terms, I want to differentiate what function I apply
over a row/column based on what type 开发者_如何转开发of data that row/column contains.
Here, I want a simple mean
if the column is numeric and the first unique
value if the column is a character column. As you can see, apply
treats all columns as characters the way I've written this function.
Just write a specialised function and put it within sapply
... don't use apply(dtf, 2, fun)
. Besides, your character ain't so characterish as you may think - run getOption("stringsAsFactors")
and see for yourself.
sapply(tf, class)
X1.3 X4.6 c..A....A....A..
"integer" "integer" "factor"
sapply(tf, storage.mode)
X1.3 X4.6 c..A....A....A..
"integer" "integer" "integer"
EDIT
Or even better - use lapply
:
fn <- function(x) {
if(is.numeric(x) & !is.factor(x)) {
mean(x)
} else if (is.character(x)) {
unique(x)[1]
} else if (is.factor(x)) {
as.character(x)[1]
}
}
dtf <- data.frame(a = 1:3, b = 4:6, c = rep("A", 3), stringsAsFactors = FALSE)
dtf2 <- data.frame(a = 1:3, b = 4:6, c = rep("A", 3), stringsAsFactors = TRUE)
as.data.frame(lapply(dtf, fn))
a b c
1 2 5 A
as.data.frame(lapply(dtf2, fn))
a b c
1 2 5 A
I find the numcolwise
and catcolwise
functions from the plyr
package useful here, for a syntactically simple solution:
First let's name the columns, to avoid ugly column names when doing the aggregation:
tf <- data.frame(a = 1:3,b=4:6, d = c("A","A","A"))
Then you get your desired result with this one-liner:
> cbind(numcolwise(mean)(tf), catcolwise( function(z) unique(z)[1] )(tf))
a b d
1 2 5 A
Explanation: numcolwise(f)
converts its argument ( in this case f
is the mean
function ) into a function that takes a data-frame and applies f
only to the numeric columns of the data-frame. Similarly the catcolwise
converts its function argument to a function that operates only on the categorical columns.
You want to use lapply() or sapply(), not apply(). A data.frame is a list under the hood, which apply will try to convert to a matrix before doing anything. Since at least one column in your data frame is character, every other column also gets coerced to character in forming that matrix.
精彩评论