Geometric Mean: is there a built-in?

2022-12-25 22:04 问答作者：

I开发者_如何学Python tried to find a built-in for geometric mean but couldn't.

(Obviously a built-in isn't going to save me any time while working in the shell, nor do I suspect there's any difference in accuracy; for scripts I try to use built-ins as often as possible, where the (cumulative) performance gain is often noticeable.

In case there isn't one (which I doubt is the case) here's mine.

gm_mean = function(a){prod(a)^(1/length(a))}

No, but there are a few people who have written one, such as here.

Another possibility is to use this:

exp(mean(log(x)))

Here is a vectorized, zero- and NA-tolerant function for calculating geometric mean in R. The verbose mean calculation involving length(x) is necessary for the cases where x contains non-positive values.

gm_mean = function(x, na.rm=TRUE){
  exp(sum(log(x[x > 0]), na.rm=na.rm) / length(x))
}

Thanks to @ben-bolker for noting the na.rm pass-through and @Gregor for making sure it works correctly.

I think some of the comments are related to a false-equivalency of NA values in the data and zeros. In the application I had in mind they are the same, but of course this is not generally true. Thus, if you want to include optional propagation of zeros, and treat the length(x) differently in the case of NA removal, the following is a slightly longer alternative to the function above.

gm_mean = function(x, na.rm=TRUE, zero.propagate = FALSE){
  if(any(x < 0, na.rm = TRUE)){
    return(NaN)
  }
  if(zero.propagate){
    if(any(x == 0, na.rm = TRUE)){
      return(0)
    }
    exp(mean(log(x), na.rm = na.rm))
  } else {
    exp(sum(log(x[x > 0]), na.rm=na.rm) / length(x))
  }
}

Note that it also checks for any negative values, and returns a more informative and appropriate NaN respecting that geometric mean is not defined for negative values (but is for zeros). Thanks to commenters who stayed on my case about this.

We can use psych package and call geometric.mean function.

The

exp(mean(log(x)))

will work unless there is a 0 in x. If so, the log will produce -Inf (-Infinite) which always results in a geometric mean of 0.

One solution is to remove the -Inf value before calculating the mean:

geo_mean <- function(data) {
    log_data <- log(data)
    gm <- exp(mean(log_data[is.finite(log_data)]))
    return(gm)
}

You can use a one-liner to do this but it means calculating the log twice which is inefficient.

exp(mean(log(i[is.finite(log(i))])))

I use exactly what Mark says. This way, even with tapply, you can use the built-in mean function, no need to define yours! For example, to compute per-group geometric means of data$value:

exp(tapply(log(data$value), data$group, mean))

The EnvStats package has a function for geoMean and geoSd.

This version provides more options than the other answers.

It allows the user to distinguish between results that are not (real) numbers and those that are not available. If negative numbers are present, then the answer won't be a real number, so NaN is returned. If it's all NA values then the function will return NA_real_ instead to reflect that a real value is literally not available. This is a subtle difference, but one that might yield (slightly) more robust results.
The first optional parameter zero.rm is intended to allow the user to have zeros affect the output without making it zero. If zero.rm is set to FALSE and eta is set to NA_real_ (its default value), zeros have the effect of shrinking the result towards one. I don't have any theoretical justification for this - it just seems to make more sense to not ignore the zeros but to "do something" that doesn't involve automatically making the result zero.
eta is a way of handling zeros that was inspired by the following discussion: https://support.bioconductor.org/p/64014/

geomean <- function(x,
                    zero.rm = TRUE,
                    na.rm = TRUE,
                    nan.rm = TRUE,
                    eta = NA_real_) {
    nan.count <- sum(is.nan(x))
     na.count <- sum(is.na(x))
  value.count <- if(zero.rm) sum(x[!is.na(x)] > 0) else sum(!is.na(x))

  #Handle cases when there are negative values, all values are missing, or
  #missing values are not tolerated.
  if ((nan.count > 0 & !nan.rm) | any(x < 0, na.rm = TRUE)) {
    return(NaN)
  }
  if ((na.count > 0 & !na.rm) | value.count == 0) {
    return(NA_real_)
  }

  #Handle cases when non-missing values are either all positive or all zero.
  #In these cases the eta parameter is irrelevant and therefore ignored.
  if (all(x > 0, na.rm = TRUE)) {
    return(exp(mean(log(x), na.rm = TRUE)))
  }
  if (all(x == 0, na.rm = TRUE)) {
    return(0)
  }

  #All remaining cases are cases when there are a mix of positive and zero
  #values.
  #By default, we do not use an artificial constant or propagate zeros.
  if (is.na(eta)) {
    return(exp(sum(log(x[x > 0]), na.rm = TRUE) / value.count))
  }
  if (eta > 0) {
    return(exp(mean(log(x + eta), na.rm = TRUE)) - eta)
  }
  return(0) #only propagate zeroes when eta is set to 0 (or less than 0)
}

In case there is missing values in your data, this is not a rare case. you need to add one more argument.

You may try following code:

exp(mean(log(i[ is.finite(log(i)) ]), na.rm = TRUE))

exp(mean(log(x1))) == prod(x1)^(1/length(x1))

继续阅读：built-in geometric-mean r statistics

Geometric Mean: is there a built-in?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？