开发者

Error with custom aggregate function for a cast() call in R reshape2

I want to use R to summarize numerical data in a table with non-unique rownames to a result table with unique row-names with values summarized using a custom function. The summarization logic is: use the mean of values if the ratio of the maximum to the minimum value is < 1.5, else use median. Because the table is very large, I am trying to use the melt() and cast() functions in the reshape2 package.

# example table with non-unique row-names
tab <- data.frame(gene=rep(letters[1:3], each=3), s1=runif(9), s2=runif(9))
# melt
tab.melt <- melt(tab, id=1)
# function to summarize with logic: mean if max/min < 1.5, else median
summarize <- function(x){ifelse(max(x)/min(x)<1.5, mean(x), median(x))}
# cast with summarized values
dcast(tab.melt, gene~variable, summarize)

The last line of code above results in an error notice.

Error in vapply(indices, fun, .开发者_StackOverflowdefault) : 
  values must be type 'logical',
 but FUN(X[[1]]) result is type 'double'
In addition: Warning messages:
1: In max(x) : no non-missing arguments to max; returning -Inf
2: In min(x) : no non-missing arguments to min; returning Inf

What am I doing wrong? Note that if the summarize function were to just return min(), or max(), there is no error, though there is the warning message about 'no non-missing arguments.' Thank you for any suggestion.

(The actual table I want to work with is a 200x10000 one.)


Short answer: provide a value for fill as follows acast(tab.melt, gene~variable, summarize, fill=0)

Long answer: It appears your function gets wrapped as follows, before being passed to vapply in the vaggregate function (dcast calls cast which calls vaggregate which calls vapply):

fun <- function(i) {
    if (length(i) == 0) 
        return(.default)
    .fun(.value[i], ...)
}

To find out what .default should be, this code is executed

if (is.null(.default)) {
    .default <- .fun(.value[0])
}

i.e. .value[0] is passed to the function. min(x) or max(x) returns Inf or -Inf on when x is numeric(0). However, max(x)/min(x) returns NaN which has class logical. So when vapply is executed

vapply(indices, fun, .default)

with the default value being is of class logical (used as template by vapply), the function fails when starting to return doubles.


dcast() tries to set the value of missing combination by default value.

you can specify this by fill argument, but if fill=NULL, then the value returned by fun(0-lenght vector) (i.e., summarize(numeric(0)) here) is used as default.

please see ?dcast

then, here is a workaround:

 dcast(tab.melt, gene~variable, summarize, fill=NaN)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜