开发者

apply treats numbers as characters

I couldn't find a solution for this problem online, as simple as it seems. Here's it is:

#Construct test dataframe 
tf <- data.frame(1:3,4:6,c("A","A","A")) 

#Try the apply function I'm trying to use
test <- apply(tf,2,function(x) if(is.numeric(x)) mean(x) else unique(x)[1]) 

#Look at the output--all columns treated as character columns...
test

#Look at the format of the original data--the first two columns are integers. 
str(tf) 

In general terms, I want to differentiate what function I apply over a row/column based on what type 开发者_如何转开发of data that row/column contains.

Here, I want a simple mean if the column is numeric and the first unique value if the column is a character column. As you can see, apply treats all columns as characters the way I've written this function.


Just write a specialised function and put it within sapply... don't use apply(dtf, 2, fun). Besides, your character ain't so characterish as you may think - run getOption("stringsAsFactors") and see for yourself.

sapply(tf, class)
            X1.3             X4.6 c..A....A....A.. 
       "integer"        "integer"         "factor" 
sapply(tf, storage.mode)
            X1.3             X4.6 c..A....A....A.. 
       "integer"        "integer"        "integer"

EDIT

Or even better - use lapply:

fn <- function(x) {
  if(is.numeric(x) & !is.factor(x)) {
    mean(x)
  } else if (is.character(x)) {
    unique(x)[1]
  } else if (is.factor(x)) {
    as.character(x)[1]
  }
}

dtf <- data.frame(a = 1:3, b = 4:6, c = rep("A", 3), stringsAsFactors = FALSE)
dtf2 <- data.frame(a = 1:3, b = 4:6, c = rep("A", 3), stringsAsFactors = TRUE)

as.data.frame(lapply(dtf, fn))
  a b c
1 2 5 A
as.data.frame(lapply(dtf2, fn))
  a b c
1 2 5 A 


I find the numcolwise and catcolwise functions from the plyr package useful here, for a syntactically simple solution:

First let's name the columns, to avoid ugly column names when doing the aggregation:

tf <- data.frame(a = 1:3,b=4:6, d = c("A","A","A"))

Then you get your desired result with this one-liner:

> cbind(numcolwise(mean)(tf), catcolwise( function(z) unique(z)[1] )(tf))
  a b d
1 2 5 A

Explanation: numcolwise(f) converts its argument ( in this case f is the mean function ) into a function that takes a data-frame and applies f only to the numeric columns of the data-frame. Similarly the catcolwise converts its function argument to a function that operates only on the categorical columns.


You want to use lapply() or sapply(), not apply(). A data.frame is a list under the hood, which apply will try to convert to a matrix before doing anything. Since at least one column in your data frame is character, every other column also gets coerced to character in forming that matrix.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜